THE ANNALS 
of 

MATHEMATICAL 

STATISTICS 

fOONDED AND EDITED BY II. C. CAIIYKH, 1030-1038 
EDITED BY S fl, WtLH, 1038-1MB 

The Official Journal of the Institute of 
Mathematical Statistics 


VOLUME XXI 


1950 



THE ANNALS 


OF MATHEMATICAL STATISTICS 


•R. C. BOSE 
W. FELLER 


M. S. Bartlett 
David Blackwell 
George W. Brown 

ElARALD CRAMlSll 

William G. Coohban 
J. F. Daly 
W Edwards Doming 
J. L Doob 


Editor 

T. W. ANDERSON 

Associate Editor* 

M. A. GIRSH1GK 
E, L. LEHMANN 

WITH T8» COOPERATION OP 

Raul S. Dwyer 
Churchill Eibbnhart 
T. E. Harris 
Paul G. Hoel 
Harold Hotkllino 
Howard Lbvbnb 
William G. Madow 
H. B. Mann 
Frederick Mqrteli.kh 


ALEXANDER M. MOOD 
JOHN W. TUKKY 


J N in M AS 
H, K Robjmnr 
H. N, Hoy 
Henry BoHtcrri: 
Walter A. Hiirwhart 
A. Wald 

Jacob Wiilkowits 
Mas A. Woodbury 


Published quarterly by the Institute of Mathematical Slatietics in March, June, 
September ann December at Baltimore, Maryland 


INSTITUTE OF MATHEMATICAL STATISTIC’S 

General Business Administration Building, University of Michigan, 

Office : Ann Arbor, Michigan 

C. H Fischer, Secretary-Treasurer 

This address should be used for all communications concerning 
membership, subscriptions, changes of address, hack numbers, 
etc., but not for editorial correspondence. Changes in mailing 
address which are to become effective for a given issue should be 
reported to the Secretary on or before the 15th of the month 
preceding the month of that issue. 

Editorial Department of Mathematical Statistics, Columbia University, 
Office : New York 27, New York 

T. W. Anderson, Editor 

Manuscripts should be submitted to this address; each manu¬ 
script should be typewritten, double-Bpaced with wide margins, 
and the original copy should be submitted (preferably with one 
additional copy). Footnotes should be reduced to a minimum 
and whenever possible replaced by a bibliography at the end of 
the paper; formulae in footnotes should be avoided, Figures, charts, 
and diagrams should be professionally drawn on plain white 
paper or tracing cloth in black India ink twice the siac they 
are to be printed, Authors are requested to keep in mind typo¬ 
graphical difficulties of complicated mathematical formulae. 
Authors will ordinarily receive only galley proofs. Fifty reprints 
without covers will be furnished free. Additional reprints and 
covers furnished at cost. 

Subscription $10.00 per year inside the Western Hemisphere; $5,00 elsewhere. 

Brice: Single issues $3,00, Back numbers are available at $10.00 per vol¬ 

ume or $3,00 per single issue, 

Composed and Printed at the 
WAVERLY PRESS, Ino., Baltimore, Maryland, U. S. A. 


Eotawd u wccad.cUa matter at the P«t Office at Baltimore, Maryland, under the act of March J, 1870. 
Copyright, I960, by the Institute of Mathematical Statistics. 


CONTENTS OF VOLUME Zl 

Articles 

Anderson, R. I.., and T. W. Anderson. Distribution of the Circular Serial 
Correlation Coefficient for Residuals from a Fitted Fourier Series.. . 59 
Anderson, T. W\, and R. L. Anderson. Distribution of the Circular Serial 
Correlation Coefficient for Residuals froin a Fitted Fourier Series . . 59 
Anderson, 1’. W. ( and Herman Robin. The Asymptotic Properties of Esti¬ 
mates of the. Parameters of a Single Equation in a Complete System of 

Stochastic Kquatioiro .570 

Bahadur, Ramus Raj, On a Problem in the. Theory of fc Populations . 302 

Bahadur, lUonr Raj, and IIermert Rorrinh. The Problem of the Greater 

Mean .409 

Bara.nkin, K. W. Extension of a Theorem of Blackwell.280 

Biunbaum, 7,. W. Effect of Linear'I’nincation on a Multinormal Population 272 
Birnbaum, 7j. W., and D. G. Chapman. On Optimum Selections from 

Multinormnl Populations.443 

Bi.omqviht, Nils. <ln a Measure of Dependence between Two Random 

Variables .593 

Carpenter, Ohmek. Note on the Extension of Craig’s Theorem to Non- 

Central Variates , . 465 

Castellan I, Maria. On Multinomial Distributions with Limited Freedom: 

A Stochastic Genesis of Pareto’s and Pearson’s Curves. 289 

Grand, IJttam. Distributions Related to Comparison of Two Means and 

Two Regression Coefficients. ..507 

Chapman, Douolas G. Some Two Sample Tests. (101 

Chapman, D, G., and Z. \V. Birnbaum, On Optimum Selections from 

Multinormal Populations. 443 

Coiien, A. G., Jr. Estimating the Mean and Variance of Normal Popula¬ 
tions from Singly Truncated and Doubly Truncated Samples.667 

Davis, R. 0. Derivation of a Broad Class of Consistent' Estimates. 426 

Dixon, W. J. Analysis of Extrema Values... 488 

Eiintis, P, Remark on my paper "On a Theorem of Hsu and Robbins’’— 138 

Felder, W. Errata .. 301 

Freeman, Murray F., and John W. Tiikky, Transformations Related to 

the Angular and the Square Root. 007 

Glasgow, Mark ()., and Robert E, Greenwood. Distribution of Maxi¬ 
mum and Minimum Frequencies in a Sample Drawn from a Multi¬ 
nomial Distribution., .. 410 

Greenwood, Robert 10., and Mark O. Glasgow. Distribution of Maxi¬ 
mum and Minimum Frequencies in a Sample Drawn from a Multi¬ 
nomial Distribution... 410 

Greville, T. N. E. Remark on W. M. Kincaid’s "Note on the Error in In¬ 
terpolation of a Function of Two Independent Variables".137 

Ui 



















iv 


VOLUME INDEX 


Greville, T. N, E. Remark on the Article “On a Class of Diatrihutinna 


that Approach the Normal Distribution Function" by George B. Dart! - 

zig.... ,, , .. . nil 

Grubbs, Frank E. Sample Criteria for Testing Outlying Observation* 27 
Gumbel, E. J., and R. D. Keeney. The Geometric Rang*? for Distributions 

of Cauchy’s Type. . . 133 

Gumbel, E. J,, and R. D. Keeney. The Extremal Quotient 528 

Gumbel, E. J., and H. von Schklling. The Distribution of the Number of 

Exceedances . . . 247 

Hammersley, J. M. The Distribution of Distance in a Hyperephere . 447 
Hodges, J. L., Jr., and E. L. Lehmann. Some Problems in Minimax Point 

Estimation . .. . 182 

Howell, John M. Errata to “Control Chart for Largest and Smallest 

Values’’. til 5 

Katz, Leo. On the Relative Efficiencies of BAN Estimates . , , 898 

Kawada, Yukiyosi. Independence of Quadratic Forma in Normally Corre¬ 
lated Variables. ... 6H 

Keeney, R. D., and E. J. Gumbel. The Geometric Range for Distribution** 

of Cauchy’s Type. ,. 133 

Keeney, R. D., and E. J. Gumbel. The Extremal Quotient 523 

Kimball, Bradford F, On the Asymptotic Distribution of the Burn of Pow¬ 
ers of Unit Frequency Differences . 2(13 

Koopmans, T C., and 0. Reikrs0l, The Identification of Structural Char- 


----- . r « , , , » , „ * „ 4 Ml# 

Krishna Iyer, P, V, The Theory of Probability Distributions of Points on 
a Lattice...,.jpg 


Lehmann, E. L. Some Principles of the Theory of Testing Hypotheses . . 1 

Lehmann, E. L., and J. L. Hodges, Jr. Some Problems in Minirnax Point 
Estimation. jg 2 

Lehmann, B. L., and Charles Stein, Completeness in the Sequential Case 37(1 
Link, Richard F. The Sampling Distribution of the Ratio of Two Ranges 

from Independent Samples. jj 2 

Lofevs, M. Fundamental Limit Theorems of Probability Theory.. , 321 

Massey, Frank J., Jr. A Note on the Estimation of a Distribution Func¬ 
tion by Confidence Limits . j j j; 

Massey, F J Jr. A Note on the Power of a Non-Parametric Test... 440 

ohrison, Winifred J„ and Jack Sherman. Adjustment of an Inverse 
Matrix Corresponding to a Change in One Element of a Given Matrix 124 
Mostbller Frederick, and John W. Tukey. Significance Levels for a 
re-b ample Slippage Test . _ J20 

NA? Tt’ n u' Di r nbutio ^ of the Sum of Roots of a Determinantal Equa¬ 
tion Under a Certain Condition, . , 432 

NA *tX ?' M 5? Compl ^ ely Unbiassed Ohaxae^*of Teste" of Inde¬ 
pendence in Multivariate Normal Systems. 293 

oack, Albert A Class of Random Variables with Discrete Distributions 127 




















VOLUME INDEX 


V 


Noether, Gottfried Emanuel. Asymptotic Properties of the Wald-Wolfo- 

witz Test of Randomness.231 

Paull, A. E. On a Preliminary Test for Pooling Mean Squares in the Analy¬ 
sis of Variance. 539 

Pillai, K. G. S. On the Distributions of Midrange and Semi-Range in Sam¬ 
ples from a Normal Papulation.100 

REIEBS 0 L, 0., and T. C. Koopmans. The Identification of Structural Char¬ 
acteristics . 165 

Robbins, Herbert, and Raohu Raj Bahadur. The Problem of the Greater 

Mean.4C9 

Kudin, Herman, and T. W. Anderson. The Asymptotic Properties of Es¬ 
timates of the Parameters of a Single Equation in a Complete System 

of Stochastic Equations . 570 

Scott, Elizabeth L. Note on Consistent Estimates of the Linear Struc¬ 
tural Relation Between Two Variables . 284 

Seth, G. It. On the Distribution of the Two Closest Among a Set of Three 

Observations.298 

Sherman, B. A Random Variable Related to the Spacing of Sample Values 339 
Sherman, Jack, and Winifred J. Morrison. Adjustment of an Inverse 
Matrix Corresponding to a Change in One Element of a Given Matrix, 124 
Shuikhandk, S. S. The Impossibility of Certain Symmetrical Balanced In¬ 
complete Block Designs. 106 

Stein, Charles. Unbiased Estimates with Minimum Variance.406 

Stein, Charles, and E. L, Lehmann. Completeness in the Sequential 

Case. 376 

Tukky, John W., and Murray F. Freeman. Transformations Related to 

the Angular and the Square Root. 007 

Tukey, John W., and Frederick Hosteller, Significance Levels for a 

/e-Samplo Slippage. Teat..... 120 

von Sohelling, Hermann. A Second Formula for the Partial Rum of Hy- 

pergeometric Series Having Unity as the Fourth Argument.458 

von Sohelling, H., and E. J. Gumbel. The Distribution of the Number of 

Exceedances.. 247 

Wald, A., and J. Wolfowitz. Bayes Solutions of Sequential Decision 

Problems... 82 

Walsh, John E. Some Estimates and Tests Based on the r Smallest Values 

in a Sample. 386 

Walsh, John E. Some Nonparametric Tests of Whether the Largest Ob¬ 
servations of a Set are Too Largo or Too Small. 683 

Wolfowitz, J, Minimax Estimates of the Mean of a Normal Distribution 

with Known Variance. 218 

Wolfowitz, J., and A. Wald. Bayes Solutions of Sequential Decision 

Problems,.. 82 

Zeigler, R, K. A Note on the Asymptotic Simultaneous Distribution of the 
Sample Median and the Mean Deviation from the Sample Median. .. 452 























VI 


VOLUME INDEX 


Miscellaneous 

Abstracts of Papers.139, 302, 401, Olfj 

Minutes of the Annual Membership Meeting, New York, December 28, 1949 155 

News and Notices. H7, 315 , 402, 020 

Report of the Berkeley Meeting of the Institute . , . . 022 

Report of the Chapel Hill Meeting of the Institute , .317 

Report of the Chicago Meeting of the Institute ,. , 407 

Report of the New York Meeting of the Institute , 151 

Report of the President of the Institute. . .... 155 

Report of the Secretary-Treasurer of the Institute. .... 100 

Report of the Editor of the Annals. 103 








SOME PRINCIPLES OF THE THEORY OF TESTING HYPOTHESES' 

By E. L. Lehmann 
University of California, Berkeley 

Introduction; 

1. The likelihood ratio principle. The development of a theory of hypothesis 
testing (as contrasted with the consideration of particular cases), may be said 
to have begun with the 1928 paper of Neyman and Pearson [16]. For in this 
paper the fundamental fact is pointed out that in selecting a suitable tost one 
must take into account not only the hypothesis but also the alternatives againBt 
which the hypothesis is to be tested, and on this basis the likelihood ratio princi¬ 
ple is proposed as a generally applicable, criterion. This principle has proved 
extremely successful; nearly all tests now in use for testing parametric hypoth¬ 
eses are likelihood ratio testa, (for an extension to the non-parametric case 
see [33]), and many of them have been shown to possess various optimum proper¬ 
ties. 

At least in the parametric case the likelihood ratio teat has a number of desir¬ 
able properties. Among these we mention: 

(i) Frequently it is easy to apply and leads to a definite and reasonable test. 

(ii) If the sample size is large, and if certain regularity conditions are satisfied 
an approximate solution can bo given for the distribution problems that arise 
in the determination of size and power of the test (Wilks (32], Wald [25]). In 
fact, if the likelihood ratio is denoted by X, -2 log X approximately has a central 
X*-distribution under the hypothesis, a non-central x*-diBtribution under the 
alternatives, The number of degrees of freedom in these distributions equal the 
number of constraints imposed by the hypothesis. 

(iii) As was shown by Wald [25], under certain restrictions the likelihood ratio 
test possesses various pleasant large sample properties, 

In view of this, one may feel that the likelihood ratio principle, although per¬ 
haps not always leading to the optimum tost, is completely satisfactory, and 
that a more systematic study of the problem of test selection is not necessary. 
Unfortunately, against the pleasant properties just mentioned there stands a 
very unpleasant one. Oases exist, in which the likelihood ratio test is not only 
unsatisfactory but worse than useless, and hence the likelihood ratio principle 
is not reliable. Examples of this kind were constructed independently by H. 
Rubin and C. Stein; the following is Stein’s example. 

1 Parts of this paper were presented in an invited address at the meeting of the Institute 
of Mathematical Statistics on Deo. 30,1048, in Cleveland, Ohio. 



2 


E. L. LEHMANN 


Let X be a random variable capable of taking on the values 0, .4-1, 1 2 with 
probabilities as indicated: 


-2 2 -1 


0 


Hypothesis H : ^ 


Alternatives: ])C 




a 


1 

1 


-?( 1 


") 


a 


1 - c 
i - « 


Here a, C are constants, 0 < a g i •— < C < a, and p ranges over the 
interval [0,1]. a 

It is desired to test the hypothesis H at significance level a. The likelihood 
ratio test rejects when X = db 2, and hence its power is C against each alterna¬ 
tive. Since C < a, this test is literally worse than useless, for a test with power 
a can be obtained without observing X at all, simply by the use of a table of 
random numbers. It is worth noting that the test, which rejects // when A' -■« 0, 


has power a 


1-C 


> a, so that a reasonable test of the hypothesis in ques* 


1 - « 
tion does exist. 

The existence of such examples gives added importance to the problem of 
developing a systematic theory of hypothesis testing. It is the purpose of the 
present paper to give a brief survey of the work done on some aspects of such a 
theory and to indicate certain extensions and modifications of the existing theory. 
Some examples and applications will be considered. These will be restricted to 
parametric problems. For applications to testing non-parametric hypotheses 
see [12], 


The results of sections 5 and 8 were obtained jointly by Gilbert Hunt and 
Charles Stein in 1945. They have not been published and v'ere communicated 
to me by Professor Stein, I should like to express to him my gratitude for ac¬ 
quainting me with this material and for giving me permission to include it in 
this paper. I Bhould also like to acknowledge my indebtedness to Professor 
Henry Scheffe who read the manuscript and made many helpful suggestions. 

2. Formulation of the problem. The problem of testing a statistical hypothesis 
was formulated by Neyman and Pearson [18] as follows, 

A random variable X is known to be distributed over a space 3£ according to 
some member of a family of probability distributions (Pfj, QtQ. It will be 
assumed here that there is specified an additive class $3 of sets in 36, and that 
the probability distributions P* are probability measures defined over f0, AH 
sets kl or va ^ ue d functions mentioned in this paper will be assumed meas- 
ura e 23 unless otherwise stated, If B e 23, we shall write for the measure as- 
signed to B by P 9 interchangeably P x ,{X c B), Pf(B), and if there is no possi- 

C ° n f USIOn ’ PtiB ^ Throu S hout most of the paper it will be assumed 
that the probability measures P» are absolutely continuous with respect to a 



THEORY OF TESTING HYPOTHESES 


3 


given sigma finite measure m defined over $8, so that there exist non-negative 
functions fe such that 

(2.1) P,(B) = f Mx) d M (z). 

B 

We shall then say that /«(*) is a generalized probability density w.r. to g. 

A statistical hypothesis H specifies a subset w of 11, and states that the dis¬ 
tribution of X is some Pf with 0 t w. A test of H is any subset to of 36, the con¬ 
vention being that H is rejected if the observed value x of X is in w, and th&t 
in the contrary case H is accepted. The selection of w is to be made as follows. 
A number a is given, 0 < a < 1, the level of significance, and w must be such 
that 

(2.2) Pd{w) = a for all 0 e w. 

Subject to this restriction it is desired to maximize Pe(w) for Q in fl — w. The 
interpretation of these conditions is immediate. Since Pb(w) is the probability 
of rejecting II computed under the assumption that P* is the distribution of 
X, equation (2.2) states that the probability of rejecting II is to bo a (usually 
some small number such as .01 or .05) whenever H is true. Similarly the second 
condition expresses the fact that H is to be rejected with high probability when 
0 is in SI — < 0 . 

Naturally the second condition is not to be taken literally but rather as a 
loosely stated principle of choice. For in general there will exist a unique Bet 
w maximizing P Sl (w) for any given 6i e 0 — w, but this w will change with 0i , 
The condition has a clear meaning only in the case that the set 12 — u contains 
only a single point, and in a few special problems in which the same set w maxi¬ 
mizes Fo{w) for all 6 « ft — u. In the general case there are available two main 
methods for making the condition precise. One may restrict consideration to 
some class of “nice” tests, so that within this class the maximization of Ps(v >) 
can be achieved uniformly for 0 e 12 — w. Alternatively, instead of asking that 
a local optimum property hold uniformly, one may look for a test whose power 
function possesses some optimum property in the large. Both of these ap¬ 
proaches have an element of arbitrariness. In the first, the selection of a class 
of nice tests, m the second, the choice of an appropriate optimum property. 
Fortunately, in a number of important special cases, both methods, for various 
reasonable definitions, lead to the same test. 

Before proceeding with this development, we shall modify the formulation 
of the problem slightly. First, as has been pointed out by many writers, it seems 
more natural to replace (2.2) by 

(2.3) Pn(w) a for all 0 « u. 

Secondly, we shall permit "randomized” tests (see [11, 29]), that is, instead of 
demanding that the statistician decide for each value of x whether to accept 
or to reject H, we shall allow the possibility that for certain x the decision be 



4 


E. L. LEHMANN 


reached by means of some chance device such as a table of random numbers. 
By a test of H we shall therefore mean a function <p from 3E to the interval 
[0, 1], with the convention that when x is the observed value of X some chance 
experiment with two possible outcomes R, R will he performed jsuc.h 
that P(R) = 4>(x), and that H will be rejected when the outcome is R and will 
otherwise be accepted. The case of a non-randomized test w clearly is obtained 
as a special case by taking for <t> the characteristic function of the net to. 

For a test <f> the probability of rejection is given by 

(2.4) f *(x)dPf(x)- B,4>VC) 

where E s denotes expectation computed with respect to the probability dis¬ 
tribution Pf. We therefore obtain the following formulation of the problem: 
To determine a test function <j> (0 g <f>(x) ^ 1) which maximizes Tfi <f>(X), the 
power of <t> against the alternative 9, for 9 in ft — u subject to the condition 

(2.5) Ee<f >(X) § a for all 8 e cu. 

In this connection it is convenient to use the term "level of significance" for 
the preassigned number a, and to define the size of the test <ft as 

(2-6) sup B„KX). 

9 tot 

Except in the trivial case that there exists a test of size < a whose power is 1 
against all alternatives, the size of any optimum test (in fact, of any admissible 
test) equals the level of significance. 

3. Testing against a simple alternative. A. complete solution of the problem 
formulated in the last section is available only in the case that « and fl — u 
each contains only a single point, that is, in the case that both the hypothesis 
and the alternative are simple. The solution is then given by the fundamental 
lemma of Neyman and Pearson [18], which we may state in the fol¬ 
lowing slightly more complete form. 

Theorem 3.1. Let 


(3,1) P>(A) = jf f,(x) d»{x). 

(a) For testing the hypothesis H : 9 = e<, against the alternative 9 « 0 X at level of 
significance a, there exists a number k and a test <p of size a such that 


(3.2) ^ = 1 when i(») > kftfix), 

*(*) = 0 when },fix) < kftfix), 

(b) // fifix) and fifix) are M 0 for all x in X, then a test <f> is mast powerful for 
testing H mind 6 - 6 , if and only if it satisfies (3.2) except possibly on a set 
of a measure 0 . (Note that the number k of (3.2) is essentially unique) 


1 Throughout the paper we shall consider two tests 
of /i-measure 0. 


as equal if they differ only on a set 



THEORY OF TESTING HYPOTHESES 


5 


The second half of the theorem may be paraphrased by saying that under 
the conditions stated the most powerful test is uniquely determined by (3.2) 
except on the set on which 

(3.3) f h {x) = ft/(„(*). 

On this set the value of <f> may be assigned arbitrarily provided tho resulting 
test has size «. If in particular the set on which (3.3) holds has measure 0, the 
most powerful test is unique. 

It should be mentioned that (3.1) is no restriction since any two probability 
measures Pi , P 2 defined over a common additive class can be represented in 
this form with n = Pi + P 2 . If the assumption of (b) is not satisfied, the 
theorem is still true m essence but some trivial modifications are necessary. 

No such complete solution is available for the problem of testing a composite 
hypothesis against a simple alternative. However, as was shown in [11], this 
problem may in many cases be reduced to the one just considered. Let the 
hypothesis state that 0 is an element of to, and consider the simple alternative 
6 — 8 X . Suppose that an additive class of sets has been defined on to (in most 
of the applications to is a subset of Euclidean space, and tho additive class is 
formed by the Borel sets contained in to). Then for any probability distribution 
X over to, 

(3.4) h\(x) - f ft(x) dX(0) 

J Oi 

is a probability density function with respect to n. 

Under certain conditions to be stated below, the most powerful test 4>\ for 
testing the simple hypothesis H\ that X is distributed with probability density 
h\ against the alternative fe, is also most powerful for testing the original hy¬ 
pothesis H against the same alternative. This is essentially the Bayes approach 
developed by Wald for his general decision theory, and in fact, under the con¬ 
ditions which we shall state, X is a least favorable distribution over a> in the 
following sense. Let be the power of <f>\ against ft, , and for any distribution 
X* over u denote by H \,, <£>,*, the associated hypothesis, the most powerful 
test for testing it against ft ,, and the power of this test respectively. Then X 
is said to be least favorable if for all X* 

(3.5) A g fa ■ 

'Theorem 3,2. Suppose there exists a probability distribution X over w such that 
the most powerful test of size a for testing H\ against fe, is of size a also with 
respect to the original hypothesis H. Then 

(i) 4>\ is most powerful for testing H against fe, ; 

(ii) X is a least favorable distribution. 

Also, if <p\ is the unique most powerful test for testing H\ against fe, , ii is the 
unique most powerful test for testing H against fe, . 

These results are essentially contained in Wald’s work (see for example 
theorem 4.8 of [26]). 



6 


E. L. LEHMANN 


There are many trivial applications of this theorem to finding muni powerful 
tests of one-sided hypotheses concerning a single real-valued parameter, Mich 
as testing H: p g pu against p - pi(pt < pi) when X has a binomial distribu¬ 
tion with parameter p. As is well known, it turns out in a number of three t .w-s 
that the most powerful tests are in fact uniformly most, powerful against the 
one-sided class of alternatives. 

In [11] Theorem 3.2 was used to determine most powerful teats of certain 
hypotheses concerning normal distributions. As an example, consider the raw 
that , • ■, X n are independently normally distributed with common mean 
f and variance a 2 . Denote by H l and H t the hypotheses a «■ 1 and £ m 0 re¬ 
spectively, and let the alternative be: f « £i, <r J = «\. Then the most powerful 
test of Hi rejects if 


g-j it Xi ~ fir < Bi when <n < 1, 

S(at, - £) ? > ci when cr, > 1, 

and accepts otherwise. Here h and c> depend only on the level of significance, 
that is, are independent of £i, <n. If & > 0, the most powerful test for tenting 
H t rejects if 

2(aii — b) 2 g JcJ) 2 when a < 


(3-7) 


§ c s when a 


and accepts H, otherwise. Here h and c 5 depend only on «, while b depends mi 
£i, «a and a. 

■ ^ cse ^ esulta hidicate that even when the class of alternatives is larger than 
m the above problems, some improvement over the Btandard tests may be 
possible provided good power is desired only against a narrow claas of alter- 

4. Sufficient statistics. Before treating the problem of composite alternatives 
we shall consider an important simplification that can bo obtained by making 
use of sufficient statistics. This notion was introduced by R. A. Fisher and was 

m which * lies, a set in the range of t is 8 sid to be raensurablo'if the eorraimn” 

r "'‘ST* *■ , D “ ot » the »' ST t 

he, shown C^™ < ?.«r Tf l8 ^ fined 0V6 ’ * 

P(B 1 1) of B given T =^uniauelv ^ d ? n / th ® Gcmdlt iomil probability 
M 6 2 * Umquely U P t0 a ** measure zero by the equation 

(4.1) pmrXA)) = J^ iB]tUP T it) forall At * 

Suppose now that we are given a class u nw-u . .... , 

8 - (P?i, 9 «Q. Denote by V * dlstnbutio » a for X, 

7 I i] the conditional probability of B given 



THEORY OP TESTING HYPOTHESES 


7 


T — t computed for the distribution Pf , The statistic T is said to be a suffi¬ 
cient statistic for 3 (or for 9) if for every B t 58 there exists a determination of 
Pt(B 1 i) that is independent of 6, 

According to the above definition of statistic, t(x) is an element of a meas¬ 
urable partition, However, one may consider instead any function t* for which 
L*(x) --- t*(x') if and only if t(x) ~ t{x'), that is, any function that leads to this 
partition; the values that the function takes on are really immaterial, It will 
be convenient here to use this wider definition of {statistic. For a rigorous treat¬ 
ment of some of the problems that will be referred to one needs to define an 
equivalence of statistics and to include in this definition the appropriate nullset 
considerations. A detailed account of these matters is given in [2] and [10], 

From our present point of view teats are compared solely in terms of their 
power functions. On this basis two tests 4>i and may be considered equivalent 
if they have identical power, that is, if 

(4.2) Erf>i(X) — Etfa(X) for all 0 c Q. 

We can then state 

Theorem 4.1. If T is a sufficient statistic for 9 and f(X) any test of a hypothe¬ 
sis concerning 9 then there exists an equivalent test that is a function of T only. 

The proof of this theorem is immediate since 

(4.3) m - mx) i T\ 

is such a test. 

It follows from Theorem 4.1 that we lose nothing by restricting considera¬ 
tion to tests based on a sufficient statistic. 3 The problem of determining whether 
or not some statistic is sufficient for a given family of distributions is simplified 
through the use of a criterion for sufficiency that can bo checked on sight. This 
criterion is due to Neyman [13] who proved it in a somewhat special setting, 
and was recently proved in a very general form by Halmos and Savage [2]. 
It states that if $ = {p 9 ), 6 «£1 is a family of generalized probability densities 
for X, then under certain mild restrictions a necessary and sufficient condition 
for T — l(X) to be a sufficient statistic for $ is that p e (x) factors into one fac¬ 
tor depending on 9 but on x only through t(z) and a second factor depending 
only on x. 

The question arises as to which of various sufficient statistics to use. Since 
the purpose of introducing sufficient statistics is to reduce the complexity of a 
given statistical problem, one is led to seek a sufficient statistic that reduces 
the problem as far os possible and hence to the notion of a minimal sufficient 
statistic, a sufficient statistic T being minimal if it is a function of every other 
sufficient statistic (see [10]). It can be shown under fairly general conditions 
that a minimal sufficient statistic exists, and one can give an explicit construc¬ 
tion for it. 


1 A justification for the use of sufficient statistics in the general statistical decision prob¬ 
lem was given in [2], 



s 


E. h. LEHMANN 


As one would expect it turns out that the sufficient stall-tin, r<u«r».uriv 
associated with various families of distributions are actually minimal. Thu* fur 
example, if Xj, • - , X„ are independently normally distributed with emmimn 
mean $ and variance a , the statistic (X, S(X t - Xf) m a minimal miffa-i'iif 
statistic for 6 = (f, v 2 ). If Xi , ■ • • , X n are independently uniformly iMributcd 
over (0, 8), roax(Xi, ■ ■ • , X„) is the minimal sufficient atatiatir for 0. If ,1 is 
the family of distributions according to which Xi, ■ • ■ , A\ art* idvnticallv in¬ 
dependently distributed according to an arbitrary univariate distribution <* )r 
according to an arbitrary probability density with reaped tn a fixed mmumte 
measure), then the minimal sufficient statistic is obtained by ddiniUR for mndi 
point x >*> («!,'••, x n ) the set d(i) as the set of points obtainable from r by 
permutation of coordinates. Alternatively one can define it by tfa , ■ • • , /„* 
(Sr,, Sr?, * • ■, Sx. n ). 

5, The principle of invariance. The notion of invariance was introduced mtn 
the statistical literature in the writings of It. A. Fisher, Hotelling:, Pitman }150) 
and others, in connection with various special problems. A general formula¬ 
tion was given by Hunt and Stein who, in an unpublished paper J5], utilized 
this notion to find most stringent tests, and who obtained the examples of uni¬ 
formly most powerful invariant tests that will be given hnlmv. The point t) f 
view in the present section is different from theirs however, since lit re invariance 
will only be considered as an intuitively appealing restriction that one may 
wish to impose on statistical tests. 

We shall begin by considering an example. Suppose it wore known that the 
height of people is distributed about a known mean, which for mnveniown wo 
shall take to be zero, either according to a normal or to a Cauchy distribution, 
with unknown scale factor so that either 



or 


(•5-2) 


fe(x) 


1 


IT e 1 + a 1 * 


0 < 8 < oo. 


Suppose wc wish to test from a sample X,, • • • , X„ the hypothesis H that the 
true probability density belongs to the first of these classes against, the rdtmm- 
tive that it belongs to the second. Then it seems desirable that the decision of 
whether or not to accept H should he independent of the scale adopted for 
measuring the heights. For otherwise one worker expressing bin data in feet 
imght reject H while another worker using the same data but .expressing them 

mi t inlw r f Ch ?f ?? l ’ ary decisian (In this connection see for example 
is -i f‘ f 01 , A Mce .. test functl011 <t> therefore would he independent of the 
choice of scale, i.e., it would satisfy the condition 


(5.3) 4>[cx i, • • • , cx n ) - , *„) for all c > 0 and for all fa , • ■ • ,*„) 

except possibly on a set N, independent of e and of measure zero, 



THEORY OF TESTING HYPOTHESES 


9 


On analyzing this problem one is led to the following observation. Multiply¬ 
ing each of the random variables Xi , ■ • ■ , X n by the same constant leaveB 
both oi and ft — « invariant, i.e., if the it’s are normally distributed with zero 
moan and arbitrary scale so arc cXi, • • • , cX n , and analogously for the Cauchy 
distributions. It is this fact that makes it bo desirable to have 4> invariant under 
multiplication of the x’s by a common constant. 

More generally consider measurable 1:1 transformations g of £ into itself, 
and let Y - gX. Suppose that when X is distributed according to 0 t u, Y is 
distributed according to O't w—we shall then write O' = QO —and that as 0 
ranges over w so does O'. Suppose that the analogous condition is satisfied for 
ft — co, so that the problem of testing u against ft — w is left invariant under g. 
Now whether one expresses the observations in terms of X or in terms of Y iB 
essentially a matter of choice of coordinates. The principle of invariance asks 
that if such a change of coordinates leaves the problem invariant, then it should 
also leave the test invariant, i.e., if G is a group of measurable 1:1 transforma¬ 
tions of X such that 

(5.4) $« = to and jj(ft — co) = ft — « for all g t G, 
then <{> should satisfy the condition 

(5.5) 4>{gx) = 4>(x) for all g t G, 

and for all x except on a set N independent of g and such that n(N) = 0. If this 
condition were not satisfied, two workers, using the same data but expressing 
them in different coordinate systems might arrive at contrary conclusions. 

As an example consider the general linear univariate hypothesis. In canonical 
form Xi , ■ ■ ■ , X, ; X r+l , • • • , X, ; X, H , - • ■ , X„ arc independently normally 
distributed with common variance. The means of the first s variables are un¬ 
known, the means of the last ns variables are known to be zero. The hypothesis 
states that the first r means are zero. Adding arbitraiy constants to each of the 
variables of the middlo group leaves u and ft — w invariant. So does any orthogo¬ 
nal transformation of the first r variables, and any orthogonal transformation of 
the last n — a variables. Finally, the problem is also left invariant when all of 
the variables are multiplied by the same constant. It is easy to see that a 
function 4> is invariant under these transformations if and only if it is a func¬ 
tion of 



But, as is well known and easy to show, among all tests based on tliiB statistic 
there is a uniformly most powerful one, namely the test that rejects II when 

±*\/± *i 

i-l / i-i+1 

is too large. Therefore, among all tests satisfying the condition of invariance 
the standard test is uniformly most powerful. 



10 


E. h. LEHMANN 


To formulate a corresponding reduction procedure in g#wr,d, wr »M.m- a 
function h on 3£ to be maximal invariant (under G ! if it vx in.nriinl 
and if h{x') = h(x) implies the existence of g t G such that s' ■ gi. 'I hen a 
function <p on K, is invariant under G if and only if it depend* nn x only through 
h(x), that is, if there exists a function \f> such that ip(x) -- ll**w < h fwer*- 

sary and sufficient condition for a test to be invariant under G h that if b« 
based on the statistic Y ~ h(X). The principle of invariant ;j,< n fore rorfm <•« 
the problem from X to Y = h(X). To determine the resulting statu* find r»* 
duction, that is, the simplification of the parameter space, f «nc may r<*mmb*r 
the group G of transformations over 0 induced by ti. If vU>) w a maximal in¬ 
variant function under Q, it is easily shown that the distribution of 1* depend* 
only on v(6). Hence under the principle of invariance any two 0-vubu-a with 
common v(0) (that is, such that each can be obtained from the other by a trans¬ 
formation of G) are identified. If in particular v(0) is constant over w, Urn hy¬ 
pothesis H, when expressed for Y, becomes simple, and there may even exist 
a uniformly most powerful invariant test. 


Besides for the example already mentioned this is the caw for Hotelling 1 }* 
T 2 -problem and for the hypothesis specifying the value of a multiple correla¬ 
tion coefficient, Another example is obtained when X t , • • • , A",, are independ¬ 
ently identically distributed, each with probability denaily p t (x) where under 
Hi pt(x) - /,( x ~ 0), (i = 0, 1), and where it is desired to test //-* against //, . 
One may also in this example replace the location parameter by a wale param¬ 
eter or have both parameters present. 


It may be worth noting that the likelihood ratio teat ia invariant under any 
transformation leaving the statistical problem invariant. In the problems con¬ 
cerning normal distributions mentioned above, when there exists a uniformly 
most powerful invariant test, it coincides with the likelihood ratio teat. That 
this is not so m general can be seen from Stein’s example given in wrfirm 1. 
there the problem remains invariant under multiplication of A” by and 

iabow a ? r y ( m °f poweriul invariant teafc - However, the likelihood 
ratio test is mstead uniformly least powerful. 

detoitirS’LT^w ’Vt” 0 ™ useM t0 “ nsi * r * »n»wtu>I wmtar 

a boud 0 r T My thsl * luMti0tt f »<■•>»*< immml under 

‘ Tn "fi ’ * ®’ - ofc) for .11 x «w„t ™ 

hypotheses one can find a group of tmnm 

under this group there exists a 8U ? h th&t am ° ng aU testjs invariant 

be raised whether this aDornwfi • ^ powerful one, The question may 
other group 1 Wtether therR »«* *»» 

to a different test. Also in problems wT™* ^ proWem inva riaxit but leading 
Also m problems where among all invariant tests there does 



THEORY OF TESTING HYPOTHESES 


11 


not exist a uniformly most powerful one, the question arises whether one is 
using the totality of transformations leaving the problem invariant, or whether 
perhaps one can reduce the problem further. It therefore seems of interest to 
determine the totality of transformations leaving a given problem invariant. 
This was carried out for a few simple problems in [8]. 

We finally mention a connection between the notions of invariance and suffi¬ 
ciency. Consider any problem in which the variables X lt ■ • ■ , X„ are inde¬ 
pendently identically distributed under all distributions of ft. Such a problem 
clearly is left invariant under any permutation of the variables. Actually, these 
transformations leave not only w and ft — w invariant but each point of ft in¬ 
dividually. No essential reduction of the problem is obtained since the maximal 
invariant statistic is a sufficient statistic! It is easily Been that this will always 
be the case when the transformations leave ft pointwise invariant, but that in 
this way one does not obtain all sufficient statistics. These can be obtained, 
however, by considering more general transformations, where each point x of 
£ is transformed into the points of £ according to a probability distribution P* . 


8. The principle of unbiasedness. As a second principle of reduction we shall 
consider the principle of unbiasedness proposed by Neyman and Pearson. A 
test is said to be unbiased [19] if 

Pt (rejecting H) > « for all 0 t ft — w. 

This seems a desirable property for a test to have since it assures that there do 
not exist 0o in « and flj in ft — w, for which 

Pb 0 (rejecting //) > Pj, (rejecting H). 

We shall therefore be concerned in this section with the totality of teals <t> for 
which ' 


( 6 . 1 ) 


Ei<j>(X ) < a for all 0 eoi 
Ei<(>{X) ^ a for all 0 e ft — co. 


For a number of important special cases there exists, among all tests satisfying 

(6.1) , one that is uniformly most powerful in ft — w and uniformly least power¬ 
ful in w. (The latter property is of course very desirable since when H is true 
one wants to reject it as rarely as possible.) This follows immediately from well 
known results concerning best similar tests since for the problems in question 
ft is a subset of a Euclidean space and for any test 4>, E^X) is a continuous 
function of 0. If then A is the set of points that are boundary points both of 
<o and of ft — to, it follows from (6,1) that 

(6.2) E<4>{X) = a for all 0 e A, 

i.e., that <t> is similar for 0 in A. But if among all tests satisfying (6.2) there 
exists one that is uniformly most powerful in ft — m and uniformly least power- 



12 


E. L. LEHMANS 


ful in «, it automatically satisfies (6.1) as is seen 1 >y rompari*<m with the Jest 
4>{X) ^ a. 

As an example suppose that .Yj, ■ ■ • , X„ are independently mutually die- 
tributed with common mean £ and common variance <r*. If the hypothesis m 
Hi . <r < 1 and the alternatives are cr > 1, the act A becomes the line a -< t. 
As was shown by Neyman and Pearson [18], among all tests satisfying 16.2} 
with this A, the test that rejects Hi when 2(x, — if < k (where k is an appro¬ 
priately chosen constant) is uniformly most powerful for & in fl ■ w, ami uni¬ 
formly least powerful for d in a. 

If instead we consider testing the hypothesis lh : a ■= 1 against the alterna¬ 
tives cr ?*1, we find that A - w, and our problem reduces to that, of finding the 
best test among all those that are similar in to and unbiased. As is well known, 
it turns out that rejecting when 2(x, - xf < h and when X(t, - j) 2 > k t 
(where h < fo are two appropriately chosen constants) is uniformly most 
powerful among all similar unbiased tests. 


S if 


A third hypothesis concerning a that might bn of interest is //* ; trj 
(r 2 . Here A consists of the two lines a ~ ^ and a -- «t and it, is easy to show 
that the test that is uniformly most powerful in fi — u and uniformly leant power¬ 
ful in oi rejects H 3 if and only if 2(x, - xf < c, or 2(x, - if > r , where again 
ci < Ci are two appropriately selected constants. 

The question arises as to the connection of the, principles tif invariance arid 
unbiasedness. Clearly if there exists a unique test 4> that is uniformly most 
powerful unbiased, this test is invariant under any group <7 leaving the problem 
invariant. If then in addition there exists a uniformly most powerful invariant 
(under (?) test, this must coincide with <f>. Thus, if both principles 1-ad to a 
unique optimum solution, these solutions coincide. 

We have seen that frequently optimum unbiased tests ran be obtained 
through a study of tests that are similar over certain seta in the parameter 
space. The totality of similar tests was obtained for a number of important 
probfems by Neyman and Pearson. In his 1937 paper on confidence, intervale 
hell ga J e a general method ^ constructing Bimilar regions with the 

flrf o r • f i T be " SUfficiGnt Stoti8tic for S £A ‘ Thfl 

tion for <j> to be similar with respect to A and of size a, is that 
(6.3) 


i.e., that 


E4(X) = E,E[^X) | T] = a for all 6 t A, 


a) = 0 for all e«A. 


(6 - 4 ) E,\mX) | T] 

Clearly any test <t> for which 

^ ^ E[4>(X) 11] = «for almost all l 

m rszr, by ^ * **» «a 



THEORY OF TESTING HYPOTHESES 


13 


analytic problem the solution of which is known in many special rases This 
method was first employed by P. L. Hsu [3] for some problems concerning 
normal distributions, and was extended to other eases in [7]. The present gen¬ 
eral formulation was given by H. Scheffd and the author in [9] and [10]. We 
shall say that a family of distributions [P^], Be A, is boundcdly complete if 

(i) fit) is bounded, 

(ii) Ecf(T) = 0 for all 6 eA 

imply f(l) = 0 except on a set N with Pa (AO — 0 for all Be A. Then we, can 
state 

Theorem 6.1. A necessary and sufficient condition for the totality of tests simi¬ 
lar for A to have Ncyman structure with respect to a sufficient statistic T is that 
{P^}, 6 ( A, be boundcdly complete. 

7. Tests whose power increases with the distance from the hypothesis. 
Frequently, even among the unbiased tests, there does not exist a uniformly 
most powerful one. The general univariate linear hypothesis with more than 
one constraint is an example of this situation. The following extension of the 
idea of unbiasedness may then be used to reduce the class of tests still further. 
Unbiasedness distinguishes between values of 6 as they belong to u or If - u 
However, one may further classify the points of ft — « according to their "dis¬ 
tance” from w, and then ask of a test <p that the further bo 0 from w the larger 
be the power fi/,0). 

One possible such ordering of the alternatives is that induced by the envelope 
power function. Here the envelope power at d (Wald [24]) is defined by 

(71) f3* a (6) = Bup $,,(0) 

where g(°0 is the class of all tests <p with E#p(X) < a for all 0 « w. Of two points 
0x, 0 2 one may then say that 0i is closer to u> than 0 2 , equally close or less close, 
as ftt (0i) is less than, equal to or greater than pt(df). The distance of 0 from w 
is thus measured by the ease with which one can detect that the hypothesis is 
false when 0 is the true parameter value. 

When 0 lies in a Euclidean space and P?(0) is a continuous function of 0 for 
all O; as is the case in most applications, the condition that the power increase 
with /?* will usually imply that — p v (9 2 ) whenever “ /9*(0a). In 

the case of the general linear hypothesis considered in section 5, for example, 

r 

one would obtain the condition that the power be. a function only of 22 {?/*’ 

i-i 

whore f, = JS(Xf). Aa was shown by P. L. IIsu [3], the standard (likelihood 
ratio) test is uniformly most powerful among all tests satisfying this condition. 
Analogous remarks apply to Hotelling's T 2 - problem, and to the hypothesis 
specifying the value of the multiple correlation coefficient. The corresponding 
optimum properties in these cases were proved by Simaika [21], 

It is interesting to compare the above condition with that of invariance. 



14 


E. L. LEHMANN 


This comparison yields nothing of interest if the totality of tests w considered. 
We may, however, restrict our attention to tests depending only on a sufficient 
statistic T. We already know that <p(X) and E[<p(X) j 7'J have identic,*! power. 
In order to validate the comparison we wish to make, we state the following 
Lemma. Let T be a sufficient statistic for 0 t ft, and lot G hr a group of /,•/ 
transformations g on X leaving Q invariant. Then if <p(x) is invariant with r G, 
E[ip(X) 1 1] is almost invariant under G. 

We can now state the desired comparison in the following 
Theorem 7.1. Let G be a group of 1:1 transformations on A', hi Cl hr (h< in¬ 
duced group of transformations on ft, let v(6) be maximal invariant under G. and 
suppose that (? leaves u and ft — to invariant . Suppose further that T in a mffii- 
cient statistic for ft, and that (P s J , 6 6 ft, is boundedly complete. Then a ntreanitry 
and sufficient condition that the power of a test f{T) be a function only of vthj, is 
that \p(t) be almost invariant under G. 

This theorem is an immediate extension of some results of WulfowiU (Ho]. 
Theorem 7.1 together with the results of section 5 proves that the standard 
tests of the general linear hypothesis, Hotelling’s ^-problem and the hypo the- 
sis concerning the multiple correlation coefficient possess the optimum property 
that was obtained for these problems by Hsu and Simaika, respectively. The 
method of proof indicated here is due to Wolfowitz [35], 


8. Most stringent tests. We shall now turn to the third aspect of the theory: 
Optimum properties defined with reference to the whole class of alternatives, 
and attainable with no restrictions imposed on the class of tests. In the present 
section we shall consider the property of stringency. Wald [25] defines a test, * 
to be most stringent if it minimizes 


( 8 . 1 ) 


sup &S!(0) - p r (e)], 


0«Q —to 


where fit again denotes the envelope power, and fi f the power of <?. The rationale 
of this definition is clear. The difference fC Vfll a n>\ w. ,, , 

rfy he *■* “*« 

A -* ia —— 

A theory of most stringent tests was developed by Hunt and Stein 151 who 

formations. t eir results m connection with the following groups of tratw- 
fii! Z = Z V h ~ °° < ° < " ’ * a real variable; 

(n) gx - ax, 0 < a, x a real variable; 

(v) 



THEORY OF TESTING HYPOTHESES 


15 


Theorem 8.1. (Hunt and Stein). If G is the direct product of a finite number 
of groups of types (i)-(v), and if G leaves the problem, invariant, that is, if G satis¬ 
fies (5.4), then there exists a most stringent test invariant under G. 

Actually, it is not necessary here to require that G be a direct product. The 
result holds also if the factoring of G is according to normal subgroups, where 
the normal subgroup at each stage and the final factor group are of the types 
mentioned. In the light of this one may omit type (iii) from the list since it has 
a normal subgroup of type (i) with factor group of type (ii). 

The proof of Theorem 8.1 is based on the following lemma, which has appli¬ 
cations to many related problems. 

Lemma (Hunt and Stein). If G is a direct product of a finitenumber of groups 
of types (i)-(v) then given any function f over 36 (0 < f(x) < 1) there exists a func¬ 
tion F (0 < F{x) < 1) such that F is invariant under G, and 

(8.2) inf [ f(gx)<fi(x) dp{x) < f F(x)<p(x) dfix) < sup [ f(gx)<p(x) dfix) 

0*0 J J gtO J 

for all ip that are integrable g. 

It follows from Theorem 8.1 that if there exists a uniformly most powerful 
invariant test, this test is most stringent. In this way Hunt and Stein show, 
for example, (see in this connection section 5), that the likelihood ratio test of 
the general univariate linear hypothesis is most stringent. A question that is 
left open is the uniqueness of such a most stringent test. 

In general, the possibility therefore remains that there might exist another 
most stringent test uniformly more powerful than the invariant one. In certain 
particular cases this possibility can be ruled out by the following considera¬ 
tion. Suppose that £2 is a subset of a Euclidean space and that every point of 
w is a limit point of Cl — co. Suppose further that for any test 4>, Erf>(X) is con¬ 
tinuous in 0 . Then clearly, if fa is similar of size a for testing u and fa is of size 
g a but not similar, fa can not be uniformly as powerful as fa . Hence any test 
that is admissible among all similar tests of size a is also admissible among the 
totality of tests of size 5T a. Now admissibility among all similar tests is some¬ 
times not too difficult to prove. For the likelihood ratio test of the general 
linear univariate hypothesis, for example, it is an immediate consequence of 
the properties of this test proved by Wald [23] and Hsu [4]. 

The following alternative method for obtaining most stringent tests iB also 
mentioned by Hunt and Stein. 

Theorem 8.2. (Hunt and Stein). Let to — u be partitioned into disjoint sub¬ 
sets to( such that |3*(0) is constant on each (h, and let <pi be the test that maximizes 
inf (9), Then if <pi = <p is independent of 8, <p is most stringent . 

»«Q4 

This result may be supplemented by the following method for finding teats 
that maximize inf ft fid) over a given set of alternatives «i (not necessarily 

satisfying the conditions imposed above on the (Vs). 



16 


E. b. LEHMANN 


Theorem 8.3. Su T ,m «M« *•» »/ *-*»" **£ 

and consider probability measures X and X: otrr « and <* - ^ ^ /'«•' * w, -‘ J 

be generalized probability densities with respect to m, so (hat h(x) ■ J* U* '<***» 

and fcxCf) = / /«(*) dXiW a« apam probability densities with rr*prrl b, ^ * 

be the most powerful test of size a for testing the simple hypothec If:ha#the 
simple alternative h , and suppose that the power of ? against h, is fh Ihm if 

E)tp(x) < a for all dew, 

EmIx) > 0 for all Oewi, 


<P maximizes inf 0 V {9) at level of significance a. 

This method, when applicable, has the advantage of giving the totality of 
most stringent tests (see in this connection Theorem 3.1) and lienee of settling 
the question of admissibility. However, in many applications probability meas¬ 
ures X, Xi with the desired properties do not exist but instead only sequences 
, Xi n) , which satisfy the conditions in the limit. In this ease again only tin* 
weak conclusion is possible: The test obtained is most stringent but has not 
been proved admissible. (For an example in which the analogous method has 
been carried through in detail for an estimation problem, see [22]). 

Actually, the two methods are closely related, as can be seen from the proof 
of the main lemma. In those cases in which there exists a group <V giving the 
maximum possible reduction, the group 0 induces a partition of S) (through the 
equivalence: tfi fls if there exists g such that f?j — §d\), just into w and the 
sets fij. (This is so mainly because, as was shown by Hunt and .Stein, the en¬ 
velope power remains invariant under any transformations that leave the prob¬ 
lem invariant.) Then the measures X, Xj over «, £h respectively, which figure in 
the application of Theorems 8.2 and 8.3, become invariant measures over 0 
through the obvious 1:1 mapping from w and the Qj’s respectively to 0. Thus 
the second method will allow the strong conclusion when the group (f involved 
in the first method possesses a finite invariant measure [types (iv) and (v) j hut 
not if any of its factors are of type (i)-(iii). 

To conclude this section we shall give an example where the method of in¬ 
variance leads only to a partial reduction but where the solution may be com¬ 
pleted by certain additional considerations. Suppose that (X ,, ■ ■ ■ , X n ) is a 
sample from a normal distribution with mean £ and variance a , both unknown, 
and that we wish to find the most stringent test of the hypothesis //: cr ->• 1 
against the alternatives a ^ 1. Theorem 8.1 reduces the problem to the sta¬ 
tistic Y = 2(Z, - X f , but among the tests of H based on this statistic there 
does not exist a uniformly most powerful one. It may also be shown [81 that 
no further reduction is possible by means of the method of invariance. 

However, one may now consider the problem of finding the most stringent 
test based on Y. (The envelope power function /3*(£, a) that must be used 



THEORY OF TESTING HYPOTHESES 


17 


naturally is not the one for Y but that for the original problem.) From an argu¬ 
ment given in [6] it follows that this test is of the form 

<Pki.k,. reject when Y < ki or > /c 2 , 

where ki, fc 2 are determined by the two conditions 

(i) Projection | <r = 1) =» a, 

(ii) sup [#!(£, <t) - 0** lifc ,(<r)] = sup {/?!(£, a) - /3 n , tj (<r)]. 

Hero <r) is independent of £ and can be obtained from a table of the x i ~ 
distribution (with n degrees of freedom for a < 1 and n-1 degrees of freedom 
for <r > 1 as can be seen from (3.6)). Hence h and fc 2 can be computed fairly 
easily. 

Another problem that may be treated in this way is the hypothesis of equality 
of variances for two normal samples. If the two samples are of equal size, there 
exists a uniformly most powerful invariant test for a suitable group of trans¬ 
formations. However, if the sample sizes are different the method of invariance 
reduces the problem only to 2(X,- — X) 2 / 2(7,- — ff , and the cut off points 
giving the most stringent test may be determined by an argument analogous 
to that given above. 

This method may be extended to allow determination of most stringent test 
of hypotheses such as Ii: <n a g <r 2 This requires a certain modification 
of Theorem 1 of [6], which is easily obtained. One finds againB that one may 
restrict consideration to a one-parameter family of tests (determined by a 
somewhat different condition than above), and that among these the moBt 
stringent test is obtained by the analogue of condition (ii) above. 

If should be mentioned that the results of [6] apply also to the hypothesis 
specifying the value of the parameter m a binomial or Poisson distribution. 
This is easily seen since in either case the distributions of fi are absolutely con¬ 
tinuous with respect to a common sigma finite measure and since for the ap¬ 
propriate choice of this measure the generalised density is of the form assumed 
for the density in [6]. Hence in both the binomial and the Poisson case the most 
stringent test is determined by conditions analogous to (i) and (ii) above. 

9. Tests that minimize the maximum loss. In the Neyman-Pearson theory 
one classifies the errors into two kinds: Rejecting the hypothesis when it is 
true, accepting it when it is false. One may however analyze the situation further 
and distinguish, say, between accepting when one or some other alternative is 
true. Thus one is led to introduce the losses that result in a given situation from 
the various possible errors, and to look for a test that, in an appropriate sense, 
minimizes the expected loss. This possibility was mentioned by Neyman and 
Pearson [17], and was taken as the starting point of his general theory by Wald 
(see for example [24]). 

In order to stay within the framework of this exposition we shall here in¬ 
troduce losses only for the errors of accepting the hypothesis when it is false, 



18 


£. L. LEHMANN 


while still demanding that the probability of rejection when the hypothesis in 
true should not exceed a. Actually, there are many cases where thw mum? to 
be a reasonable formulation. For it frequently happens that the. two type* of 
error entail consequences of such completely different nature that the resulting 
losses cannot be measured on a common scale while usually the different errors 
of the same type are comparable. 

We shall therefore assume that for each 8 t fi — w there is defined a H'fiO), 
which measures the loss resulting from acceptance of II when 0 is true, The 
risk which one runs by using a test <p, when 6 e ft — w is the true parameter 
value is given by the expected loss R r (6) = W{8) Et[ 1 - ^{X)j. When a uni¬ 
formly most powerful test exists for the hypothesis in question, this teat alto 
minimizes the expected loss uniformly for 9 in fl — w. In the contrary caw (me 
may again restrict the class of tests in some way, so that within the restricted 
class there exists a uniformly most powerful test, and hence & teat that uni¬ 
formly minimizes the expected loss. Alternatively we may again consider Rome 
optimum property of the risk function R f (9) as a whole. We shall lie re consider 
the minimax principle introduced by Wald, and seek a test, which, subject to 
EW>(X) < a for all 6 t u, minimizes 


sup W(6) l 

«»a-u 




the maximum risk. 

If one introduces losses also for the other type of error it is easy to see (list 
for a suitably chosen loss function the definition of minimax expected loss coin¬ 
cides with that of stringency. It is therefore not surprising that the methods 
of the previous section can be extended to cover the problems considered in 
the present one. (They are actually much more general, and may be applied 

also, for example, to the problem of point estimation, and in fact to the general 
decision problem). 

From the lemma of Hunt and Stein stated in the previous section we im¬ 
mediately obtain the following extension of Theorem 8.1. 

Tf r G iS a 9T0UP °! formations baring the hypothec and 

^ mv T ant > lf 0 he iact0Ted to ™ n nal subgroups into 
factors of types (i)-(v), and if the loss function W(0) is invariant under Q then 
there exists a test <p invariant under G and minimizing 


(9.1) 


sup W(8) E,[ 1 

#4Q—« 




that When a uniforml y most powerful invariant test exists this 

f 8 s “ b « in «. 

W( 0 ) is constant. ’ epaces sets n ‘ by sets over which 



THEORY OF TESTING HYPOTHESES 


19 


Again it may happen that the method of invariance does not reduce the prob¬ 
lem sufficiently far but that the solution may be completed by other considera¬ 
tions. Let us once more consider the hypothesis H:<r — l of the previous section, 
and let us suppose that the loss function has the necessary invariance property, 
so that it is a function only of a but not of the unknown mean. It follows from 
Theorem 9.1 that there exists a test minimizing the maximum risk, which is a 
function only of Y — 2(X< — X)*. From [G] it is easily seen that a test 
which rejects when Y < fci or > fcj, has the desired property if its size is a and 
if in addition 

(9.2) sup W(*)E, [1 - y(F)] = sup W(c)E, [1 - p(Y)]. 

e<l *>1 

It follows that depending on the choice of W (o-) the solution may be any member 
of the one-parameter family of tests ipki.k, of size a. 

Under the conditions of Theorem 9.1, when a uniformly most powerful in¬ 
variant test exists, this also maximizes the average power for a large class of 
weight functions. If there exists a common finite invariant measure over the 
sets U> in the sense indicated in section 8, the uniformly most powerful invariant 
test will maximize the average power with this measure as weight function, over 
fij for all 5. It follows that it maximizes the average power over — w with 
respect to any weight function for which the conditional distribution over each 
flj is the above invariant measure. If the invariant measure over the Oj’s is not 
finite one can obtain analogous results with respect to a sequence of weight func¬ 
tions invariant in the limit. The results indicated here are much weaker than 
those obtained for the general linear univariate hypothesis by Wald [23] and 
Hsu [4] under the restriction to similar regions. However their results are no 
longer valid when this restriction is omitted. 

10. Applications to sequential analysis. So far we have restricted considera¬ 
tion to the case that the hypothesis is to be tested on the basis of a preassigned 
experiment. However, frequently there is available for this purpose a large class 
of experiments, and the selection of an optimum experiment out of this class is 
part of the problem. We shall consider here only the following situation, which 
has recently been studied extensively (see Wald [28, 29]), There is given a se¬ 
quence of random variables Xi , Xi , • • • whose joint distribution is known to 
belong to some family % = (P«), 8 e fi; the hypothesis specifics some subfamily: 
0 e oj, The X's are observed one by one, and the decision, whether or not to con¬ 
tinue experimentation at any given stage, is allowed to depend on the observa¬ 
tions taken up to that point. Thus the number n of observations that will be 
taken is a random variable whoso distribution depends on 0. Usually, by an 
appropriate choice of stopping rule, there may be effected a considerable saving 
in the expectation of the number of observations necessary to achieve a given 
discrimination between hypothesis and alternatives. The problem is to deter¬ 
mine the stopping rule and test that minimizes this expectation. 

As we have seen in the previous sections the principal methods for obtaining 



20 


K. L LEHMANN 


optimum tests consist in reducing the problem to that of testing a simple hy¬ 
pothesis against a simple alternative This basic problem was solved in the non¬ 
sequential case by Neyman and Pearson (Theorem 3.1). The solution of the 
much more difficult corresponding sequential problem wins obtained for a large 
class of cases by Wald and Wolfowitz [31] in the following 
Theokem 101. Let X x , Xt, ••• be identically and independently distributed. 
It is desired to test the hypothesis that the common piobalnliti / density of the X'n is 
f(x) against the alternative that it is g(x). Given two numbers 0 < a < fl <* 1, there 
exists a test which, subject to the condition. 


( 10 . 1 ) 


P (rejection | /) < a 

P (rejection ] g) > (3, 

minimizes simultaneously Ej{n) and E 0 (n), the expected number of observations 
computed for the distributions f and g. This test is given in terms of two numbers 
A and B by the following rule. After m observations have been taken, 


reject if 


g(xf) ■ • ■ g{x m ) 


accept if 


A* i) • 
Q{% i) 


■ /(3m) 

g(x J 


> A, 


fix i) • • ■ f(x m ) 


<B, 


take another observation if B < - 


fix i) 


< A 

■/(*.) 


Here A and B are determined so that condition (10.1) holds with the inequality 
signs replaced by equality. 

So as to be able to treat the various problems considered non-sequontially in 
the previous sections one would have to extend this theorem at least to the case 
that the variables X x , X,, •■■ form a set of equivalent variables in the sense 
ol de Dinetti [1]. Instead, we shall here restrict ourselves to a fewproblems that 
can be solved on the basis of Theorem 10.1. All of the tests discussed below were 

b:Ti f r-T 10Ua P01 ? tS °!™ W and aome of their P r0 P ertic8 were discussed 
y Girshick in his important Contributions to the theory of sequential smaly- 

sis Annals of Math. Stat., vol. 17 (1946) pp. 123-143 and 282-298, and by Wald 
in his basic book on the subject [28]. J 

-i T- ttrrZ!* 7?/’ "° dify 8 “ e,rtly the of Uro problem of 

ypothesis testing. Let the parameter space fi be divided into three spin the 

ISSSi* th<! llrp “ the “ s ' l ( e ' lass of ftHcmativea u,, end a region of 

(X, ... X ) bv Y tv. ' Let , U8 denote tlie sequential random variable 
to’ y ™ W,8h 10 dctemta « » ooouontial te»t which, 


Eyp{X) < a for 8 coo 
EtwiX) > # for 8 e aii, 


( 10 . 2 ) 



THEORY OF TESTING HYPOTHESES 


21 


minimizes sup E>(n). (Actually, this is a rather artificial formulation. The 

Q a w o+w 1 

natural requirement is the minimization of sup Ee(n ) but this is a much more 

s.a 

difficult problem.) The reduction to the problem of testing a simple hypothesis 
against a simple alternative is achieved by the following obvious extension of 
Theorem 8.3. 

Theorem 10.2. Let Xo, Xi be distributions over w 0 , to , respectively , and let p be 
a test, which subject to 


(10.3) 


f Evp{X)dX ,(«) < a 
[ EMX)dX,(0) >p, 


minimizes sup 


/ 


E t (n) dXt(9). Then if p satisfies (10.2) and 


(10.4) 


E 6 (n) < sup J E»(n) dXfd) for all 0 e 


<p minimizes Bup Ee(n) subject to (10.2). 

Li J + Wj 


As in section 3 we can make certain trivial applications to problems concerning 
a single real parameter such as testing the hypothesis H: p < p 0 against the 
alternatives p > pi (po < pi), where p is the probability of success in a binomial 
sequence of trials. In this example condition (10.2) of Theorem 10.2 obviously 
is satisfied when A 0 and Xi assign probability 1 to p 0 and pi respectively. Hence 
the probability ratio test for testing p = p 0 against p = pi has the desired prop¬ 
erties, whenever (10.4) holds, that is, whenever E P (n ) attains its maximum 
between p 0 and pi. 

The following is another example that may be solved in this manner. Let 
X \, Xi , • ■ • ; Yi, V 2 , ■ ■ ■ be independently normally distributed, all with 
unit variance and means E(X, ) = f, E(Y,) = y. In order to test the hypothesis 
H: £ > v against the alternatives y — £ > 5 where 5 > 0 is given, a pair (Xi , Fj.) 
is observed. If after this observation experimentation continues another pair 
(X 2 , Yf) is observed, etc. In this case we may take for X 0 , Xi the distributions 


that assign probability 1 to the parameter points (£, y) => (0, 0) and( — -,-) 

\ 2 2 / 

respectively. Then the probability ratio after m observations is given by 


(10 5) 


exp 


'-*£(‘ + 0 ’-* 


e 


F»w«=ist 



G 


Since the distribution of Y - X depends only on y - it is easily seen that 
condition (10.2) is satisfied. 

Some further results can be obtained through extension to the sequential case 
of Theorems 8.1 and 9.1. 



22 


E. h, LEHMA.NN 


Theoeem 10.3. Suppose that G is of the type described in Theorem 9.1, let Y = 
f(X i, Xi , • • ■ ) maximal invariant under G, let v(6) be maximal invariant 
under G, and let the set of values of v(8) corresponding to wo and wi 6« ws and w t , re- 
spectively. If among all tests of wo dgainst wi based on Y, the test <p minimizes sup 

Ei(n) subject to' 

Eetp(Y) < a if w(0)ewg 

( 10 . 6 ) 

Es<p(Y) > /S if v(B)tux, 


then <p also minimizes sup Et(n) among all tests based on the X‘s and which satisfy 

w o“ 4 " w i 

( 10 . 2 ). 

As an example consider the problem of testing the hypothesis a < at against, 
the alternatives <r > a i (<r 0 < <rf) when the A’s are identically, independently 
normally distributed with unknown mean and variance. Since the problem re¬ 
mains invariant under a common translation of the A’s we can take for T of 
the theorem Y = (A 2 — Xi, X 3 — Xi, • ■ • ). Equivalently we may take as our 
new sequence of variables (F t , F 2 , • • ) where 


(10.7) 


F* 


hXk+i — (Ai -f* * * • X x ) 

VkiF+T) 


Then F 2 , F 2 , ■ •• are independently normally distributed with zero mean and 
the same variance as the X’s. Hence the problem reducos to a typo which wo have 
already considered. The optimum test is based on 


m m+1 / 

£f;= z(x<- 

i-i \ 


■X) + ' * * + Am+i'' 


m + 1 / ’ 

It may be worth pointing out that Theorems 3.2, 8.3, 10.2 all aro special 
cases of simple results in the general theory of statistical decision functions, of 
which the following is the prototype. (For a detailed treatment of this theory 
see, for example, [30]). Let (Pjj, fleft, be the family of possible distributions of 
a random variable X, and let [5] be a family of decision functions. The low 
resulting from the use of 8(x) when P, is the true distribution is TF|0, 8(.c)] and 
the risk function associated with 8 is E*(0) = E,W[d, 8(A)]. Let X be a probability 


measure over O, and let 8 X be a decision function that minimizes J ft,(0) d\(6). 
Then if X is such that 


( ' 10 ' 8 ) R hW 5 J R h (i) dX(f) for all 0 < O, 

S\ minimizes sup R t (8). 

9 

Peoof, Let 8* be any other decision function. Then 

sup R h (e) < I R h (e) d\(e) z J R t ,(e) d\(e) g S u P R„(o). 

In an analogous manner one can give an extension of Theorems 8.1, 9.1, 10.3 



THEORY OF TESTING HYPOTHESES 


23 


11. Two sided tests considered as 3-decision problems. In a number of 
important special problems the hypothesis specifies the value of a real valued 
parameter or states that this parameter lies in a certain interval, and it is desired 
to test this hypothesis against the obvious two-sided class of alternatives. It 
seems that in nearly any problem of this kind that would arise in practice one 
would want to decide when rejecting the hypothesis, whether the true parameter 
value lies below or above the hypothetical ones. If for example one rejects the 
hypothesis that the means of two normal populations are equal, one usually 
wants to decide which of the two is larger. It would therefore seem most natural 
to formulate such problems as 3-decision problems. 

Problems of this kind, as all problems of hypothesis testing, naturally are 
special cases of the general decision problem formulated by Wald. We shall here 
consider the case that upper bounds are given for the probabilities of certain 
types of errors and thereby obtain a formulation, which is closely analogous to 
the classical formulation of hypothesis testing discussed in tins paper, and which 
will allow immediate application of a large portion of the theory discussed here. 

Consider the case that U is partitioned into 3 parts, o>, coi, wj where in a certain 
sense w lies between <vi and coj. We wish to test the hypothesis H: 8eu. When we 
reject the hypothesis, wo shall reach either decision D x that 0e«i or decision 
that dto > 2 . Correspondingly we prescribe two positive numbers m, a 2 and impose 
the restriction that 

P«(Di) «i if 8 eu -f- uh 

( 11 . 1 ) 

Pe (Dz) < if 0 eoi + « 2 . 

Subject to this condition it is desired to maximize 

Pi(D x ) for 

( 11 . 2 ) 

Pe{D{) for 9 <« 2 . 

A test will now consist of two non-negative functions <p x and <fa satisfying 
(11.3) *(*) + fc(») < 1, 

with the convention that when X = x the decision D, will be taken with prob¬ 
ability (p,(x ) (t = 1, 2). 

There is no difficulty concerning the extension of the notions of invariance 
or sufficient statistic, in fact these notions obviously apply to the general deci¬ 
sion problem. The notion of unbiasedness is extended in the obvious way by the 
condition 


(11.4) 


Pi(Di) > for 8 tui 
Pi(.Di) > for 6 tUi . 


One then obtains the following 

Theorem 11,1. Suppose that for testing the hypothesis H x : dtas + «2 against 
the alternatives 8tui at level of significance a x , the test <f >i among all unbiased tests 



24 


E L. LEHMANN 


ia uniformly most powerful in w ua and uniformly least powerful in w £ , and that 
<t >2 has the analogous property for testing Hi \ 8 tw + on against 0 tut at significance 
level a 2 , If 4>i(x) + <jn(x) f 1 for all x, then among all procedures satisfying (11.1) 
and (11.4), the procedure (fo., cfe) uniformly maximizes the probability of a correct 
decision. (If the tests <fn , 4>i take on only the values 0 and 1, the condition 4>i(.e) -f- 
4> 2 {x) g 1 states that the rejection region of each of the two hypotheses i« con¬ 
tained in the acceptance region of the other.) 

As an example consider the case that Xi, * • • , X„ are independently, nor¬ 
mally distributed with common mean £ and variance <r*. Suppose we wish to 
test the hypothesis that cn Sfiin where c-j may equal o% . Then it follows from 
Theorem 11.1 that among all unbiased procedures of level (m , af), there exists 
one that maximizes the probability of a correct decision uniformly in £, a. 
This is the procedure under which decision Di or D 3 is taken as S(z> — ,r) 2 < /.*, 
or S k, and the hypothesis is accepted otherwise. Here the fc’s arc determined by 

(n PW* - £ kx | <n) - m 

P(S(z, — &) 2 S: fc 8 | of) = «j. 


REFERENCES 

(11 B. db Finetti, “La provision: sos loia logiquea, sea aourcea aubjectivca," ,■ innate* th 
Vlnstilut Henri Poincark, Vol. 7 (1937), p, 1. 

[2] P. R. Halmos andL J. Savage, “Applications of the Radon-Nikoclym theorem to the 

theory of sufficient statistics,” Annals of Math. Stal., Vol. 20 (19-tO), p, 225. 

[3] P. L Hsu, “Analysis of variance from tho powor function ulawtpoint, 1 ' Hwmrtrika. 

Vol 32 (1941), p. 62. 

(41 P L. Hsu “On the power function of the E»-tost and tho t» Annals of Math, 
beat., Vol. 16 (1946), p. 278 

tf! v' t IU t T AND S ™ IN -'' Most stringent teats of atatistical hypotheses,“ unpublished, 
a947) H p A 97’ 0U amiU<i8 ° f admisBible tests,” Armais of Math. Sint., Vol. 18 

1,1 hI ' poU ''“" , witb “• 

1,1 01 p™u«» i»- 

” B 'ifJsZvTf'aSlTm 0 " P "““ "*“-••• '*'»• »«■ 

" 01 E ' “wbfclfd' Se “" 6 ’ «*«■ *»J „«m.. 

[HJ E. L. Lehmann and C Stein, “Most powerful teats of composite hypotheses I V or . 

mal distributions,” Annals of Math. Slat , Vol. 10 (1048), p. 496 
[12] E. L Lehmann and C, Stein “On fViA \F * 

Annals of Math. Slat., p ^ ^ mn ^ Wrh hypothwM 

m J - *T2 2rvVTSr t “”S e “*«««>.■■ <«*,. 

J h ” oth4 *“ “»r—a». 

'it Sibito,”°i* IZl j LTaff “f im “ io “ b *“ d ll “ blmibbl lleoty 

1161 I- Nbym.n m k s. A - ™- 239 < lw >. »• M3. 



TiiKoriY nr tewin'r hyphthhskh 


->n 


[17] J, Neyman and E. S. Pearson, "On (ho testing of utaf imical hypothruM hi relation to 

probability a prion,” Proc. Canb. Phil. fine., Vol. 29 (1933), p. 492. 

[18] J. Neyman and E. S. Pearson, "On the problem of (ho mrnit efficient r»! 

tical hypotheses,” Phil. Trans. liny. Dae. Lawton, ftoirvt A, Vol. 211 <1933f. p 

[19] J. Neyman and E. H. Peabron, "Contributions to the theory of tcnUriR statistical 

hypotheses. 1. Unbiased critical regions of typo A and type At,” Slut Urn. Men , 
Vol. 1 (1936), p. 1. 

[20] E. J. G. Pitman, "Tests of hypotheses concerning location and wale parameters," 

Biometrika, Vol. 31 (1939), p. 200. 

[21] J. 11. Simaika, “On an optimum property of two important statistical ("Rts," lti» 

metnka, Vol. 32 (1911), p. 70. 

[22] C. Stein and A. Wald, "Sequential confidence intervals for the mean of a normal 

distribution with known variance,” Annals of Math. Slat , Vol. 1R (11M7), p. 427. 

[23] A. Wald, "On the power function of the analysis of variance teat,” Annuls of Math 

Slat., Vol. 13 (1942), p. 434. 

[24] A. Wald, “On the principles of statistical inference," Notre Dame Malh, Lectures, 

No. 1 (1942). 

[25] A. Wald, "Tests of statistical hypotheses concerning several parametera when the 

number of observations is large,” Tram Am. Malh. Sac.., Vol, 51 (191.T), p, iW 

[26] A. Wald, "Statistical decision functions which mini mire Urn maximum risk,” Annals 

of Math., Vol. 46 (1945), p. 265. 

[27] A Wald, "An essentially complete class of admissible derision functions,” Annals of 

Math Slat., Vol 18 (1947), p. 549. 

[28] A. Wald, Sequential analysis, John Wiley and Rons, 1047. 

[29] A. Wald, "Foundations of u. general theory of sequential decision functions,” Pro a 

ometrica, Vol. 15 (1947), p. 279. 

[30] A. Wald, "Statistical decision functions,” Annals of Math. Slat., Vol. 20 (1919), p 165. 

[31] A. Wald and J. Wolfowitz, "Optimum character of the sequential probability ratio 

test,” Annals of Malh. Slat , Vol. 10 (1048), p, 326. 

[32] S. S. Wii.kb, “The largc-samplc distribution of the, likelihood ratio for testing compos¬ 

ite hypotheses,” Annals of Math. Slat., Vol. 9 (1938), p. 60, 

[33] J. Wolfowitz, “Additive partition functions and ft class of statistical hyjKitliesw*,” 

Annals of Math. Slat , Vol. 13 (1042), p. 2*17. 

[34] J. Wolfowitz, “Non-parametric statistical inference,” Proceedings of the Berkeley 

Symposium on mathematical statistics and probability (1049), p, 93. 

[35] J Wolfowitz, “The power of the classical tests associated with the normal distribu¬ 

tion,” Annals of Malh. Slat., Vol. 20 (1949), p. 540. 

Some related papers not referred to in the text. 

[36] T. W. Anderson, “On the theory of testing serial correlation,” Skand. Akluarielid » 

skrift, (1948), p, 88. 

[37] II. A. Fisher, The design of experiments, Oliver and Iloyd, 1935. 

[38] M, N. Giiohii, “On the problem of similar regions," Sankhya, Vol. 8 (1948), p. 329. 

[39] P. G. Hokl, "Testing the homogonaity of Poisson frequencies,” Annals of Math. Slat., 

Vol. 10 (1945), p. 302. 

[401 P. G. IIokl, "Discriminating between binomial distributions," Annals of Malh, Slot., 
Vol. 18 (1947), p, 550. 

[41] P. G, Hobl, “On the uniqueness of Bimilnr regions,” Annals of Malh, Slat., Vol. 19 

(1948), p. 60 

[42] E. L. Lehmann, “Some comments on large sample tests,” Proceedings of the Berkeley 

Symposium on mathematical statistics and prababilily (1940), p. 451. 

[43] H. B. Mann and A. Wald, “On the choice of the number of intervals in the application 

of the x*-test,” Annals of Malh. Slat,, Vol. 13 (1942), p. 306. 

[44] J. Neyman, Lectures and conferences on mathematical statistics, Graduate School of the 

U. S. Dept, of Agriculture, 1938. 



26 


E. L. LEHMANN 


[45] J. Neyman, “Basic ideas and some recent results of the theory of testing statistical 

hypotheses,” Journal Roy. Stat. Soc., Vol, cv (1042), p. 292, 

[46] J. Neyman, “On a statistical problem arising in routine analysis and in sampling in¬ 

spection of mass production/' AnnaU of Math. Slot., Vol 12 (1W1), p. 46. 

[47] 8. N. Roy, “Notes on testing composite hypotheses, I, II,” Sankhya, Vol. fi (1947), p. 

267 and Vol. 9 (1948), p. 19. 

[48] H. ScHBFrf, “On the theory of testing composite hypotheses with one constrainfc," 

Annals of Math. Slat., Vol. 13 (1942), p. 280, 

[46] H. ScHBFFf], “On the ratio of the variances of two normal samples," ,4»i no hi of Math. 
Slot,, Vol. 13 (1942), p. 371. 

[60.] 0. Stein, “A two sample test for a linear hypothesis whose power in independent of 
the variance,” Annals of Malh. Slat., Vol. 16 (1945), p. 243. 

[51] A. Wald, “Asymptotically most powerful tests of statistical hypotheses," AnnaU of 
Math. Stat., Vol. 12 (1941), p. 1. 

[62] A Wald, “Some examples of asymptotically most powerful testa," Aynah of Math. 

Slat., Vol. 12 (1941), p. 366. 

[63] A. Wald, “On the efficient design of statistical investigations," AnnaU of Maih. Slot., 

Vol. 14 (1943), p. 134. 



SAMPLE CRITERIA FOR TESTING OUTLYING OBSERVATIONS' 

By Frank E. Gkuubs 


University of Michigan and Ballistic Research Laboratories 

1. Summary. The problem of testing outlying observations, although an old 
one, is of considerable importance in applied statistics. Many arid various types 
of significance tests have been proposed by statisticians interested in this field 
of application. In this connection, we bring out in the Historical Comments 
notable advances toward a clear formulation of the problem and important 
points which should be considered in attempting a complete solution. In Section 
4 we state some of the situations the experimental statistician will very likely 
encounter in practice, these considerations being based on experience. For testing 
the significance of the largest observation in a Bample of size n from a normal 
population, we propose the statistic 

cj* £ (x< - £„) 2 

Of. .'-1 

S 1 " B 

Z (*. - *>* 

<-i 

1 tv*"* I j n 

where x x < x 2 < ■ • • < x n , £„ = -- £ * ( and S, = -^* 1 . 

n — l n <~i 

A similar statistic, Si/S 2 , can be used for testing whether the smallest observa¬ 
tion is too low. 

It turns out that 



where s 2 = ^ 2(x, — x) 2 , and T„ is the studentized extreme deviation already 

suggested by E. Pearson and C. Chandra Sekar [1] for testing the significance 
of the largest observation. Based on previous work by W. R. Thompson [12], 
Pearson and Chandra Sekar were able to obtain certain percentage points of T n 
without deriving the exact distribution of T n , The exact distribution of S\/i S* 
(or T n ) is apparently derived for the first time by the present author. 

For testing whether the two largest observations are too large we propose the 
statistic 


£ (x, - f„_i, n ) 2 
i~i _ 

£ fa - £)* 

i-i 


1 This paper has been extracted from a thesis 
University of Michigan. 


Sl-l.n 

' S 2 


1 

£n _ hn = X, 

n — 2 i-i 

approved for the Degree of PhD at the 


27 



28 


FKANK E. GRUBBS 


and a similar statistic, SU/S 1 , can be used to test the siguitieancv of the two 
smallest observations. The probability distributions of the above sample Mat Elina 


TABLE I 


S\ 


Table of Percentage Points for -- 
Percentage Points 


or 


Si 

S* 


n 

1% 

2.5% 

6% 

10% 

3 

.0001 

.0007 

.0027 

i .0109 

4 

.0100 

.0248 

.0494 

J .0975 

5 

.0442 

.0808 

.1270 

.1984 

6 

.0928 

.1453 

.2032 

J ,2820 

7 

.1447 

.2066 

.2690 

,3503 

8 

.1948 

.2016 

.3261 

.4050 

9 

.2411 

.3101 

.3742 

.4502 

10 

.2831 

.3526 

.4154 ; 

.4881 

11 

.3211 

.3901 

.4511 

. 5204 

12 

.3554 

.4232 

.4822 

.5483 

13 

.3864 

.4528 

.5097 

,5727 

14 

.4145 

.4792 

.5340 ! 

, .6942 

15 

.4401 

.5030 

.5559 

.6134 

16 

.4634 

.5246 

.5755 

.6306 

17 

.4848 

.5442 

.5933 

.6461 

18 

.5044 

.5621 

.6095 

.6601 

19 

.5225 

.5785 

.6243 

,6730 

20 

.5393 

.5937 

.6379 

.6848 

21 

.5548 

.6076 

.6504 

.6958 

22 

.5692 

,6206 

.0621 

.7058 

23 

.5827 

.6327 

.6728 

.7151 

24 

.5953 

.6439 

.6829 

.7238 

25 

.6071 

.6544 

.6923 

.7319 


S* - 2 ( x < - &Y where £ = - 23 
< _l n <~i 

S* = 2 (*< - x„) 2 where £„ = —— T' 

•'- l n-lS 1 

= 23 (z< — £i) ! where 55 = —JL V a; 

‘“ 2 71 — 1 ' 

mVentTi f “ ITS ^ aQd tables ° f appropriate percentage points are 

tests have not Table V) ' Althou ® h the efficiencies of the above 

tests have not been completely investigated under various models for outlying 




TESTING OUTLYING OBSERVATIONS 


29 


observations, it is apparent that the proposed sample criteria have considerable 
intuitive appeal. In deriving the distributions of the sample statistics for testing 
the largest (or smallest,) or the two largest (or two smallest) observations, it was 
first necessary to derive the distribution of the difference between the extreme 
observation and the sample mean in terms of the, population <r. This probability 


TABLE IA 


Table of Percentage Points for T„ * — 

— £ £ 
- or T\ = - 

s 

- Xi 

.i 

n 

1% 

2 6% 

6% 

10% 

3 

1.414 

1.414 

1.412 

1.400 

4 

1.723 

1.710 

1.689 

1.645. 

5 

1.955 

1.917 

1.869 

1.791 

6 

2.130 

2.067 

1.996 

1.894 

7 

2.265 

2.182 

2.093 

1.974 

8 

2.374 

2.273 

2 172 

2.041 

9 

2,404 

2.349 

2.237 

2.097 

10 

2.540 

2.414 

2.294 

2.146 

11 

2.606 

2.470 

2.343 

2.190 

12 

2.003 

2.519 

2.387 

2.229 

13 

2.714 

2.562 

2.426 

2.264 

14 

2.759 

2.602 

2.461 

2.297 

15 

2.800 

2,638 

2.493 

2.326 

10 

2.837 

2.070 

2.523 

2.354 

17 

2.871 

2.701 

2.551 

2.380 

18 

2.903 

2.728 

2.577 

2.404 

19 

2.932 

2.754 

2.600 

2.426 

20 

2.959 

2.778 

2.623 

2.447 

21 

2.984 

2.801 

2.644 

2.467 

22 

3.008 

2.823 

2.664 

2.486 

23 

3.030 

2.843 

2.683 

2.504 

24 

3.051 

2.862 

2.701 

2.520 

25 

3.071 

2.880 

2.717 

2.537 


Xi < 

< Xi • - ■ < 




n ,--i n <~i 


distribution was apparently derived first by A. T. McKay [11] who employed 
the method of characteristic functions. The author was not aware of the work of 

McKay when the simplified derivation for the distribution of —-outlined 

a 

in Section 5 below was worked out by him in the spring of 1945, McKay’s result 



30 


FRANK E. GRUBBS 


being called to his attention by C. C. Craig. It has been noted also that K. It, 
Nair [20] worked out independently and published the same derivation of the 
distribution of the extreme minus the mean arrived at by the present author «ee 
Biometnka, Vol. 35, May, 1948. We nevertheless include part of this rtorivafinn 
in Section 5 below as it was basic to the work in connection with the derivations 
given in Sections 8 and 9. Our table is considerably more extensive than Nair'a 
table of the probability integral of the extreme deviation from the wimple mean 
in normal samples, since Nair’s table runs from n *• 2 to « •< *?, whereas our 
Table II is for n = 2 to n = 25. The present work is rtmeluded with come ex¬ 
amples. 


2. Introduction. Scientific data are collected usually for purposes of interpre¬ 
tation and if proper use is to be made of the information thus obtained then some 
decision should be reached or some action taken as a result of analyzing the data. 
In many cases a critical examination of the data collected is necessary in order 
to insure that the results of sampling are representative of the tiling nr process 
we are examining. Quite frequently our observations do not appear to he, con¬ 
sistent with one another, i.e. the data may seem to display non-homogeneities 
and the group of observations as a whole may not appear to represent a random 
sample from, say, a single normal population or universe. In particular, one or 
more of the observations may have the appearance of being "nutliere” and we 
aTe interested here in determining once and for all whether such observations 
should be retained in the sample for interpreting results or whether they should 
be regarded as being inconsistent with the remaining observations. It is clear 
that rejection of the “outliers” in a sample will in a great number of coses lead 
to a different course of action than would have been taken had such observations 
been retained m the sample. Actually, the rejection of "outlying 1 * observations 
may be just as much a practical (or common sense) problem as a statistical one 
and sometimes the practical or experimental viewpoint may naturally outweigh 
^y statistical contributions. In this connection, the concluding remarks of 

l • ar + 8 Pert 'T t: “ In the final ana, y 8i8 jt wouid seem that the 

question of the rejection or the retention of a discordant observation reduces to 

should bTallT 011 TV f ertabIy the judgment of an experienced observer 
cal LoubS m r6aching a deci8il,n * Thi « judgment 

theory of Drobabilff 6 T? * he ™ « «■ <>r tests baaed m, the 
. 7 p bbty ’ but any test which requires an inordinate amount of ealeu- 

tT to ,!» thetestimony of Sid, 

’Ofizxzsf?* TT* *«* *• -i- wfthStZ: “ 

■ ,‘ h ‘ l8tlC * 1 01 for i udKing or 

viewpoints or k „ZL, • h P ,' lthe ' m “PP^IS doubtful practical 
mental knowledee of 110^.'™ ° r ac ? 10n 1,1 f* 16 Absence of sufficient eoperi, 

In “' -— 

In the present treatment, we intend to throw so mc ligtt beyond wotk 




31 




32 


PRANK E. GRDHHB 


TABLE II—Conlinufd 


3 i s 6 r n 


VI 

■ \ 

2 

2.00 

,99f>32 

2.06 

.99626 

2.10 

.99702 

2.16 

.99764 

2 20 

99814 

2.26 

.99864 

2,30 

.99886 

2 35 

.99911 

2.40 

.99931 

2.46 

.99947 

2 50 

.99950 

2.66 

.99969 

2.60 

.99976 

2 66 

.99982 

2.70 

.99987 

2 76 

99990 

2 80 

.99992 

2.86 

.99994 

2.90 

.99996 

2 95 

.99997 

3 00 

.99998 

3 05 

99998 

3.10 

.99999 

3.16 

.99999 

3.20 

.99999 

3.25 

3 30 

3 36 
3.40 

3.45 

3.50 

3.65 

3.60 

3 65 
3.70 

3.75 

3 80 

3 85 
3.90 

3 95 

1.00000 


.97854 

.95818 

.98193 

.96416 

98483 

.96938 

98731 

.97392 

.98942 

.97785 

.99121 

.98125 

.99273 

.98418 

99400 

.98669 

.99607 

.98883 

99696 

.99066 

.99670 

.99222 

.99732 

.99353 

.99782 

,90464 

.99824 

.99657 

.99858 

.99035 

. 99886 

.90701 

.99909 

.99755 

.99928 

.99800 

.99943 

.90838 

.99966 

.99868 

.99964 

,99894 

.99972 

.99914 

.99978 

.99931 

.99983 

.99945 

.99987 

.99956 

99990 

,99965 

99992 

.99972 

99994 

.99978 

.99995 

99983 

.99996 

.99986 

.99997 

.99980 

.99998 

.99992 

.99998 

.99004 

.99099 

.99995 

99999 

.99006 

.99999 

.99997 

1.00000 

99998 


.99998 


.99999 


.99999 


.93682 

.91526 

.94530 

.92827 

,95289 

.93605 

.95949 

.94468 

.90527 

.95229 

.97032 

.95807 

.97470 

.06482 

.07850 

,96092 

.98178 

.97435 

.98461 

97810 

98703 

.98151 

.98911 

.98436 

99088 

.98681 

,99238 

.98891 

.99365 

.99070 

,99473 

.99223 

.99564 

.99352 

,99640 

,90461 

.99704 

.99553 

.99767 

.99631 

.99801 

.99096 

.99838 

.99760 

.99868 

.99795 

.99893 

.09832 

.99913 

.99863 

.99930 

.99889 

.99944 

.99910 

.99955 

.99927 

.99964 

.99941 

.99971 

.99953 

.99977 

.99062 

.99982 

.99070 

.99980 

.99978 

.99989 

.99981 

.99991 

.09985 

.09993 

.99988 

.99995 

.99991 

99996 

.99903 

.99997 

.90994 

99997 

.99995 


.89381 

,87264 

90721 

.88832 

.91916 

9<m« 

,921177 

91490 

03917 

9WH 

94746 

.93591 

95476 

94462 

.96114 

95239 

.96672 

.mm 

.97158 

.06457 

.07580 

.96999 

97014 

.97413 

.98253 

.1(7827 

.9K529 

.9H15H 

.08761 

.98143 

.98959 

0KRHH 

.99128 

.0HH97 

.99272 

09075 

.99393 

.90227 

.99496 

.99365 

.99582 

.99164 

.99655 

.99566 

.99716 

.90632 

.99766 

.99697 

.99808 

.99750 

.99843 

,99795 

.99872 

.99832 

.99896 

.90863 

.99916 

.99889 

.99932 

.99910 

.09946 

.99927 

.99966 

.09941 

.89966 

.09952 

.99972 

.99962 

.09977 

.99909 

.99082 

.99970 

.99986 

.99981 

.99989 

.99985 

.99091 

.99988 

.99993 

.99990 


.mim a 
, 1 m a 
*wi ;a 
wma ja 
■ti'rti ia 

.92138 i2 
Mils 
.!««) J‘a 
.U5m 3a 
.tiwa ia 

.90112 3 
9fiU1 2 
. kt.w.i 'a 
(a 

Ml'.11 ,2 

win |a 
. own ia 
swi ia 
■wm ja 
.Man a 

,99343 li 

.Ml 53 3 

.09540 3 
.99625 3 

,09090 3 

.mm a 

.00701 3. 

.90 S21I 3. 

.(Kim 3 . 

.99880 3. 

.99908 3, 
.00025 3, 
.99040 3. 
.99051 3, 

.90001 3. 

. 99969 3. 

.99975 3. 

.99980 3. 
.99984 3. 
.99987 3. 


§ 8 8 § S oSSSS 8 S 8 S 8 8SSS8 S s £ 5 » oSSSJS SSS8S S53S8 



TESTING OUTLYING OBSERVATIONS 


33 


TABLE II —Continued 


\ IP 

■\ 

2 

3 

•1 

5 

f> 

7 

8 

9 

W 

4.00 



.99990 

.09908 

.99996 

.09995 

.99092 

.99990 

4.00 

4.05 



.99999 

.99999 

.99997 

.99996 

.99994 

99992 

4.05 

4,10 



1 OOOOO 

.99999 

.99998 

.09997 

.99095 

.99994 

4.10 

4.15 




99999 

.99998 

99097 

.99998 

.99995 

4.15 

4.20 




.99999 

.99999 

.90998 

.99997 

.99996 

4.20 

4.25 




.99999 

.99999 

.99998 

99998 

.99997 

4.25 

4.30 




1.00000 

.99999 

.99999 

.99998 

.99998 

4.30 

4.35 





.99999 

.99999 

.90999 

,99998 

4.35 

4.40 





1.00000 

.99999 

.99999 

.99999 

4.40 

4.46 






99999 

.99999 

,99999 

4.45 

4.50 






1.00000 

.99999 

.99999 

4.50 

4.56 







1 OOOOO 

.99909 

4.65 

4,60 








1.00000 

4.60 

\ n 
u \ 

10 

u 

12 

13 

14 

15 

16 

17 

U 

.25 

.00001 

.00000 

.00000 

00000 

00000 

.OOOOO 

.00000 

.00000 

25 

.30 

.00003 

00001 

.00000 

.00000 

.00000 

.00000 

.00000 

.00000 

.30 

.35 

.00011 

.00004 

.00001 

00001 

.00000 

.00000 

.00000 

.00000 

.35 

.40 

.00032 

.00013 

.00005 

.00002 

.00001 

.00000 

,00000 

.00000 

.40 

.45 

.00080 

.0003G 

.00016 

.00007 

.00003 

.00001 

.00001 

.00000 

.45 

.60 

.00178 

.00086 

.00042 

.00021 

.00010 

.00005 

.00002 

.00001 

,50 

.65 

.00351 

.00185 

.00098 

.00051 

.00027 

,00014 

.00008 

.00004 

K73 

.60 

.00643 

.00303 

.00204 

.00115 

.00065 

.00037 

.00021 

00012 

.60 

.65 

01098 

.00657 

00393 

.00235 

00141 

.00084 

.00050 

.00030 

.65 

.70 

.01760 

01113 

00702 

.00443 

.00279 

.00176 

.00111 

.00070 

.70 

.75 

02694 

.01780 

01177 

.00777 

.00514 

.00339 

.00224 

.00148 

75 

.80 

.03928 

.02707 

.01865 

.01285 

.00886 

.00610 

00420 

00289 

.80 

.85 

.05503 

.03938 

02818 

.02016 

.01442 

.01031 

.00738 

.00527 

.85 

.90 

.07444 

.05610 

04077 

.03017 

.02232 

.01652 

.01222 

.00901 

.90 

.95 

.09761 

.07448 

.05682 

.04334 

.03305 

.02521 

01922 

.01460 

.95 

1.00 

.12452 

.09763 

.07665 

06000 

.04703 

.03687 

.02889 

.02205 

1.00 

1.05 

15497 

,12454 

.10008 

.08041 

.06460 

.05190 

.04160 

.03348 

1.05 

1.10 

.18807 

.15503 

.12737 

.10464 

.08595 

.07000 

.05799 

.04702 

1.10 

1.15 

.22520 

. 18879 

. 15825 

.13203 

.11110 

.09315 

.07800 

.00541 

1.15 

1.20 

.26407 

.22542 

.19240 

.10420 

.14013 

11957 

.10203 

.08700 

1.20 

1 25 

.30475 

.26442 

.22941 

.19901 

.17263 

. 14973 

.12987 

,11204 

1.25 

1.30 

34666 

.30525 

26876 

.23662 

.20830 

.18336 

.10140 

.14207 

1.30 

1.35 

.38924 

.34734 

30992 

.27050 

.24007 

22005 

,19629 

.17509 

1.35 

1.40 

.43196 

.39011 

35229 

31810 

.28721 

.25931 

.23411 

21135 

1.40 

1.45 

.47430 

.43302 

39529 

.36082 

.32934 

.30058 

.27433 

.25036 

1.45 





34 


FRANK E. GRUBBS 


TABLE II —Continued 


VI 

10 

11 

12 

13 

■ '\ 





1 60 

.61583 

47655 

.43838 

40408 

1.65 

.55615 

.51726 

.48104 

.44733 

1.60 

.59495 

.65774 

52282 

.49004 

1.66 

63196 

.59688 

.56332 

,63178 

1.70 

.69699 

.63380 

.60221 

.67216 

1.75 

69991 

,66892 

.63925 

.61086 

1.80 

.73063 

,70189 

.67424 

.64763 

1,86 

.75912 

.73264 

.70704 

.68229 

1.90 

.78538 

.76113 

.73758 

.71472 

1.95 

,80945 

.78737 

.76684 

.74486 

2.00 

.83141 

.81140 

.79183 

77269 

2.05 

.85133 

83330 

.81560 

.79824 

2.10 

86932 

.85314 

.83721 

.82165 

2.15 

.88550 

87105 

.85678 

84271 

2 20 

89998 

.88713 

.87440 

.86183 

2.25 

,91290 

.90151 

89021 

.87902 

2.30 

.92437 

.91431 

.90432 

.89441 

2.35 

.93453 

.92568 

.91088 

.90812 

2.40 

.94348 

.93572 

.02799 

.02030 

2.45 

95134 

.94457 

.93781 

.93106 

2.50 

95823 

.95233 

94644 

.94055 

2.66 

96424 

.95912 

.95400 

94887 

2.60 

.96948 

.96504 

.96060 

95616 

2.65 

97401 

.97019 

90035 

.96261 

2.70 

.97793 

.97464 

.07134 

.96802 

2.75 

98131 

.97849 

97555 

.97280 

2 80 

.98422 

.98180 

.97937 

.97693 

2.85 

98671 

.98464 

.98257 

.98048 

2.90 

.98883 

.98708 

.98531 

.98353 

2.95 

.99064 

.98915 

.98765 

.98614 


14 

15 

16 

17 

W 

.37244 

.34327 

.31636 

.29150 

1.50 

.41505 

.38676 

.35960 

.33434 

1.55 

.45930 

.43046 

.40342 

.37807 


.50199 

.47384 

.41726 

.42216 

1.65 

.54358 

.61641 

.49058 

,40002 

1.70 

.58370 

.65773 

.53289 

.50915 

1.76 

.62204 

.69744 

.57380 

.55108 

1.80 

.65838 

.63528 

.61297 

.59144 

1.85 

.69254 

.67102 

.65016 

.62992 

mSl 

.72443 

.70453 

.68516 

.66630 

my 

.75399 

73571 

.71786 

.700-12 

2.00 

78121 

. 76-153 

.74819 

.73218 

2.05 

.80614 

.79101 

.77614 

70153 

2.10 

,82885 

.81519 

.80174 

.78849 

2.16 

.84041 

.83715 

82505 

.81311 

2.20 

.86706 

.85099 

.84816 

,83545 

2.25 

.88458 

.87484 

.86518 

.86663 

2.30 

.89943 

.89081 

88224 

.87376 

2.36 

.91264 

.90604 

.89748 

,88997 

2.-10 

.92435 

.91766 

.91101 

.90-140 

2.45 

.93468 

,92883 

,92300 

.01720 

2.60 

.94376 

.93866 

.93367 

.92850 

2.65 

.95172 

,94728 

.04285 

.93844 


.95866 

.95482 

'05098 

.94715 

2.65 

.96471 

.96139 

.06807 

.95475 

2.70 

.96995 

.08709 

.96423 

.90137 

2,76 

.97448 

.97203 

.96967 

.96712 


97839 

.97620 

.97418 

.97208 

2.86 

.98174 

.97995 

.97816 

.97036 

ISI 

.98462 

.98300 

.98166 

.98003 

2.95 


3.00 

3.05 

3.10 

3.15 

3.20 


.99218 ,99092 
.99348 .99242 
.99458 .99369 
.99561 ,99476 
.99628 .99566 


3.25 
3 30 
3.35 
3.40 
3.45 


.99694 .99641 
,99748 .99704 
.99793 ,99757 
.99831 .99801 
.99862 99837 


.98965 

.99134 

.99278 

99400 

.99502 


.98837 

.99026 

.99187 

99323 

.90437 


98708 
.98917 
.99095 
,09245 
.99372 


98578 

.98807 

.99002 

.99167 

.99307 


.98448 

.98697 

.98909 

.09089 

.99241 


.98318 

.98587 

.98816 

.09010 

,99175 


3.00 

3.05 

3.10 

3.15 

3.20 


99588 

99660 

.99720 

.99770 

.99812 


.99634 .99479 99424 
99615 ,99569 ,99523 
.09682 .99644 .99606 
.99739 .99707 ,99676 
•99786 .99760 .99733 


.09369 

99477 

.99568 

.99644 

.99707 


.99314 

.99431 

.99529 

.99611 

.99680 


3.26 

3.30 

3.36 

3.40 

3.45 













TESTING OUTLYING OBSERVATIONS 35 


TABLE II —Continued 


V 

10 

H 

12 

U 

14 

15 

16 

17 


“ \ 










3.eo 

.90888 

.99887 

.99846 

.99825 

.99803 

.99781 

.09759 

.09737 

3.60 

3.65 

.99909 

.99892 

.99875 

.99867 

.99839 

.99821 

.99803 

.99785 

3.55 

3.60 

.99926 

.99912 

.99898 

.09884 

.09880 

99864 

.09839 

.09824 

3.60 

3.66 

.09940 

.99929 

.99917 

.99906 

.99894 

.90881 

.09869 

.99867 

3.65 

3.70 

.99962 

.99943 

.99933 

.99924 

.99914 

.99904 

.09894 

,09883 

3.70 

3.76 

.09901 

. 09954 

.09946 

.99938 

.09930 

.09922 

.99914 

.09906 

3.75 

3.80 

.99960 

.99983 

.99967 

.99960 

.99944 

.99937 

.09930 

.99023 

3.80 

3.86 

.99976 

.99970 

09965 

.99960 

.99966 

.99949 

,99044 

.99938 

3.85 

3.90 

.99980 

.99976 

.90972 

.99968 

.99964 

.99969 

.99955 

.90960 

3.90 

3,96 

.99984 

.99981 

.09978 

.99974 

.99971 

.99967 

.99964 

.99960 

3.95 

4.00 

.99988 

.99985 

.90982 

09980 

.99077 

.09974 

.99071 

.99068 

4.00 

4.06 

.09900 

.99988 

.09988 

09984 

.99982 

.99979 

.90977 

.09974 

4.05 

4 10 

.99992 

.99991 

.90989 

.09987 

.99986 

.09983 

.99981 

.99070 

4.10 

4.16 

.99994 

.99993 

.00991 

.99990 

.99988 

.09987 

.99985 

.99984 

4.15 

4,20 

.99995 

99994 

,99993 

.99992 

.99991 

.99990 

.99988 

.99987 

4.20 

4.26 

99996 

.99995 

.09995 

.99994 

.99993 

.90992 

.99901 

.99900 

4.26 

4.30 

.99997 

.99996 

.99998 

.99906 

.99094 

.99993 

.09993 

.09992 

4.30 

4.36 

.99998 

.99997 

99997 

09906 

.99996 

.99995 

.90904 

.99903 

4.35 

4.40 

.99998 

.99998 

.90997 

.90997 

.99996 

.99096 

.99996 

,99995 

4.40 

4.46 

.99999 

.99998 

.90998 

.99998 

.09907 

.00907 

.99996 

.90990 

4.46 

4 60 

.99999 

.99999 

90098 

09998 

99998 

.99908 

.90997 

,99997 

4.60 

4.65 

99999 

.99999 

99999 

.99999 

.00098 

90998 

.99908 

.99997 

4.66 

4.60 

.99999 

99999 

.09999 

.90090 

00099 

.99998 

.00908 

.99098 

4.60 

4.66 

1.00000 

.99999 

99999 

.99999 

.99900 

.90099 

.09909 

.99998 

4.66 

4.70 


1.00000 

.99999 

99999 

.99999 

,99900 

.90999 

.99099 

4.70 

4 76 



1.00000 

1.00000 

99999 

.99999 

.99909 

.99999 

4.76 

4.80 





1.00000 

.99999 

.99909 

.09099 

4.80 

4.86 






1 00000 

1.00000 

1.00000 

4.85 

\ - 










„\ 

X 

18 

19 

20 

21 

22 

2.1 

24 

25 

tt 

.60 


.00000 

.0000 

.0000 

0000 

.0000 

.0000 

.0000 

.60 

66 


.00001 

.0000 

.0000 

.0000 

.0000 

.0000 

.0000 

.55 

.60 

.00007 

.00004 

.0000 

.0000 

.0000 

.0000 

,0000 

.0000 

ESI 

.66 

.00018 

.00011 

.0001 

.0000 

0000 

,0000 

.0000 

.0000 

.65 

.70 

.00044 

.00028 

.0002 

.0001 

.0001 

.0000 

.0000 

.0000 

.70 

.76 

00098 

.00065 

.0004 

.0003 

0002 

.0001 

.0001 

.0001 

.76 

80 

.00199 

.00137 

.0009 

.0007 

0004 

.0003 

.0002 

.0001 

.80 

.86 

.00377 

00270 

.0019 

.0014 

0010 

.0007 

.0006 

.0004 

.86 

.90 

.00669 

00494 

0037 

.0027 

.0020 

.0015 

.0011 

.0008 

.90 

.96 

.01118 

.00853 

0065 

.0049 

.0038 

.0029 

,0022 

.0017 

.96 









36 


FRANK E. GRUBBS 


TABLE II —Continued 


VI 

■ \ 

is 

19 

20 

21 

22 

3J 

34 

ts 1 


1 00 

.01775 

.01391 

.0109 

.0085 

.0067 

.0052 

.0041 

.0032 

1.00 

1.05 

.02690 

.02161 

.0174 

.0139 

.0112 

.0090 

.0072 

.oo,w 

1.05 

1.10 

03911 

.03212 

.0204 

.0217 

.0178 

.0146 

.0120 


1.10 

1.15 

.05481 

.04592 

.0385 

.0322 

.0270 

.0226 

.0190 

.0159 

1.15 

1.20 

.07428 

.06338 

.0541 

.0461 

.0394 

.0336 

,0287 

.0244 i 

| 

1.20 

1.25 

.09769 

.08472 

.0735 

.0637 

.0553 

.0479 

.0416 

,0300 

1.25 

1.30 

.12504 

.11005 

.0969 

.0853 

.0750 

.0660 

.0581 

,0512 ' 

1.30 

1.36 

.15618 

.13930 

.1242 

.1108 

.0988 

.0882 

.0786 

.0701 

1.35 

1.40 

.19080 

.17225 

.1555 

.1404 

.1267 

.1144 

.1033 

,0932 

1.40 

1.45 

.22848 

.20851 

.1903 

.1736 

.1585 

.1446 

.1320 

.1204 1.45 

1.50 

.26869 

.24761 

.2282 

.2103 

.1938 

.1786 

.16-16 

.1516 

1.60 

1.56 

.31084 

28899 

.2687 

.2498 

.2322 

.2159 

.2007 

.1866 

1.55 

1.60 

.35430 

33202 

.3111 

.2916 

.2732 

.2500 

.2390 

.2248 

1.00 

1.65 

.39846 

37607 

.3549 

.3349 

.3162 

.2984 

.2816 

.2668 

1.05 

1.70 

.44269 

.42052 

.3994 

.3794 

,3604 

.3424 

.3252 

.3089 

1.70 

1.76 

.48645 

40476 

.4440 

.4242 

.4053 

.3872 

.3609 

, 3534 

1.75 

1.80 

.52924 

.60827 

4881 

.4687 

.4502 

.4323 

.4152 

.3987 

1,80 

1.85 

.57065 

.55058 

.6312 

.6126 

.4946 

.4771 

.4003 

.4441 

1.85 

1.90 

.61031 

59130 

.6729 

.5549 

.5377 

.5209 

.5047 

,4890 

1.90 

1.95 

64796 

.63011 

.6127 

.6958 

.5794 

.5634 

.6479 

.5328 

.1.95 

2 00 

.68340 

.66678 

.6606 

.6348 

.6103 

.6042 

.6895 

.6752 

2.00 

2.05 

71650 

.70114 

.6881 

.6714 

.6570 

.6429 

.0291 

.6156 

2,05 

2.10 

.74719 

.73311 

.7193 

.7058 

.6924 

.8703 

.6605 

,0640 

2.10 

2.15 

.77545 

.76262 

7500 

.7375 

.7254 

,7133 

.7015 

,6899 

2.16 

2.20 

.80132 

.78971 

.7782 

7070 

.7558 

.7448 

.7340 

,7234 

2,20 

2.25 

.82486 

.81440 

8041 

7938 

.7838 

.7738 

.7640 

.7643 

2.25 

2 30 

84616 

.83679 

.8275 

.8184 

.8093 

.8003 

.7914 

.7827 

2.30 

2.35 

.86533 

.85699 

8487 

.8405 

.8324 

.8244 

.8164 

.8085 

2,36 

2.40 

.88251 

.87511 

.8078 

.8605 

.8533 

.8461 

.8300 

.8319 

2.40 

2.45 

.89783 

89129 

8848 

,8784 

.8720 

.8656 

.8593 

.8530 

2.45 

2.50 

.91142 

.90568 

,9000 

.8943 

.8887 

.8831 

.8776 

.8719 

2.50 

2.55 

92345 

.91842 

.9134 

.9084 

.9035 

.8985 

,8036 

.8888 

2.55 

2 60 

.93404 

,92965 

.9253 

.9200 

.9166 

.9123 

.9080 

.9037 

2.60 

2.65 

.94332 

.93951 

.9357 

.9319 

.9282 

.9244 

,9207 

.9169 

2.65 

2.70 

.95144 

.94814 

.9448 

.9416 

.9382 

.9361 

,9318 

.9286 

2.70 

2.76 

.95852 

.95567 

.9528 

9500 

.9472 

.9444 

.9415 

,9387 

2.75 

2.80 

.96466 

.96220 

.9598 

.9573 

.9649 

.0524 

.9500 

.9476 

2.80 

2 85 

96997 

.96787 

9658 

.9837 

.9616 

9595 

■ 9574 

.9563 

2,85 

2.90 

.97456 

.97275 

9710 

.9692 

.9674 

.9656 

.9638 

.9620 

2.90 

2.95 

.97850 

.97696 

.9754 

.9739 

.9724 

.9709 

.9693 

.9678 

2.95 





TESTING OUTLYING OBSERVATIONS 


37 


TABLE II —Continued 


\ - 

„\ 

X 

18 

IV 

20 

21 

n 

21 

24 

25 

to 

3.00 

98187 

.98057 

.0793 

.9780 

.9767 

.9753 

,9741 

.9728 

3.00 

3.05 

.98476 

98365 

.9825 

.9814 

.9803 

.9703 

.9781 

.9771 

3.05 

3.10 

.98722 

.98020 

.9853 

.9844 

9835 

.9820 

9810 

,9807 

3.10 

3.15 

.98931 

.98852 

.9877 

.9869 

9862 

.9853 

.9840 

.9838 

3.15 

3.20 

.99108 

.99042 

.9898 

.9891 

9884 

.9878 

.9871 

9805 

3.20 

3.25 

.99268 

.99202 

.9915 

.0909 

.9904 

.9808 

.9803 

.9887 

3.26 

3.30 

.99384 

.99337 

.9929 

.9924 

.9020 

.9915 

.0911 

.9006 

3.30 

3.36 

.99490 

.99451 

.9941 

.9937 

.9933 

.9930 

0026 

.0922 

3.35 

3.40 

.99579 

.99546 

.9951 

.9948 

.9045 

.0942 

9939 

.9030 

3.40 

3.45 

99653 

.99626 

.9960 

9957 

.9955 

.9952 

9949 

.9947 

3.45 

3.60 

.99716 

.99603 

.9967 

.9965 

.9903 

.9961 

.0058 

.9950 

3.50 

3.65 

.99766 

.99748 

.9973 

.9971 

.9969 

.9908 

.0966 

.9904 

3.55 

3.60 

.99809 

.99794 

.9078 

.9970 

.9976 

9973 

.9972 

.0971 

3.60 

3.65 

.99844 

99832 

.9982 

.9981 

.9970 

.9978 

9077 

.9070 

3.65 

3.70 

.99873 

.99863 

.9985 

.9984 

.9983 

.9982 

.9982 

.9981 

3.70 

3.75 

.99897 

.99889 

.9088 

.9987 

.9986 

.9980 

.9085 

9984 

3.76 

3.80 

.99917 

.99910 

.9990 

.9000 

.9989 

.9088 

.9088 

.9988 

3.80 

3.85 

.99933 

.09927 

.9992 

.9992 

.9091 

.9991 

.9990 

.9990 

3.85 

3.90 

.99946 

.99941 

.9994 

.9093 

.9993 

.9993 

.0992 

.9992 

3.00 

3.95 

.99956 

.99053 

.9995 

.9906 

.9904 

.9994 

.9904 

.9994 

3.95 

4.00 

.99965 

.99902 

.9996 

.9990 

.9995 

.9995 

.0995 

.9995 

4.00 

4.05 

.99972 

.99009 

9997 

.9096 

.9996 

.9996 

.9996 

.9996 

4,05 

4.10 

.99977 

.99975 

.9997 

9097 

.9997 

.9997 

.9997 

.9997 

4.10 

4 15 

.99982 

99980 

.9998 

.9998 

.9998 

.9998 

.9908 

.9998 

4.15 

4.20 

.99986 

.99984 

.9998 

.9998 

.9998 

.9998 

.9998 

.9998 

4.20 

4.25 

.99989 

99987 

.9999 

.9999 

.9999 

.9999 

.9099 

.9999 

4.25 

4.30 

.99991 

.99990 

.9999 

.9999 

.9999 

.9999 

.9900 

.9999 

4.30 

4.35 

.09993 

.99992 

.9090 

.9999 

.9999 

.9999 

.9909 

.9999 

4.35 

4.40 

.99994 

.99994 

.9999 

.9999 

.9999 

.9099 

.0999 

.9999 

4.40 

4.46 

.99995 

99995 

1.0000 

.9999 

.9999 

.9999 

.9999 

.9999 

4.45 

4.50 

.99900 

.99990 


1.0000 

1.0000 

1,0000 

.9009 

.0999 

4.50 

4.65 

.99907 

.99997 





1,0000 

1 0000 

4.65 

4.60 

.99098 

.99997 







4.00 

4.65 

.99998 

.99998 







4.05 

4.70 

.99998 

.99998 







4.70 

4.75 

.99999 

.99998 







4.75 

4.80 

.99999 

.99999 







4.80 

4 85 

.99999 

.99999 







4.85 

4.90 

1 00000 

1.00000 







4.90 


S acoe-sl *4 p o a si >&. >Tk. cc c*3 to bO ^ ^ ? 2 S :r 2 2 ? 9 1 2 ! > 5 . w l: ^ jo. »-* p O 

5t 5 Ot OOlOCaO C» O W O ci ocaocno Ci O W O ft OCiO&O Ol O Cl O S G vi o5 o 



38 


FRANK E. QHUB 1)8 


that has already been done [1], [2], (3], [4], [11], [12], [20] on tin' problem of tent¬ 
ing outlying observations statistically and to see just when' our contributions 
fit into this corner of mathematical statistics. First, however, we give a very 
brief history of the problem, 


3. Historical comments, A survey of statistical literature indicates that the 
problem of testing the significance of outlying observation*', rmoveri cnnmdcmble 
attention prior to 1937. Since this date, however, published literature on the 
subject seems to have been unusually scant— perhaps becaure of inherent diffi¬ 
culties in the problem as pointed out by E. S. Pearson and C. Chandra Hekar [1], 
These authors made some important contributions to the problem of outlying 
observations by bringing clearly into the foreground the concept of efficiency of 
tests which may be used in view of admissible alternative hypotheses. 

In 1933, P. R. Rider [2] published a rather comprehensive survey of work on 
the problem of testing the significance of outlying observations up to that date, 
The test criteria surveyed by Rider appear to impose as an initial condition that 
the standard deviation, a, of the population from which the items wop* drawn 
should be known accurately. In connection with such tests requiring accurate 
knowledge of <r, we mention (l) Irwin’s criteria [3] which utilise, the difference 
between the first two individuals or the difference between the second and third 
individuals in random samples from a normal population and (2) the range 3 or 
maximum dispersion [4], [5], [6], [7], [8], [9], [10], [18] of a sample which 1ms been 
advocated by “Student” [4] and others for testing the significance of outlying 
observations. We remark further that a natural statistic to use for testing an 
“outlier” is the difference between such an extreme observation and the Rumple 
mean. In 1935, McKay [11] published a note on the distribution of the last- 
mentioned statistic and by means of a rather elaborate procedure, obtained a 
recurrence relation between the distribution of the extreme minus the mean in 
samples of n from a normal universe and the, distribution of this statistic in 
samples of n 1 from the same parent. McKay gave also an approximate expres¬ 
sion for the upper percentage points of the distribution but did not tabulate the 
CTact distribution due to the complicity of the multiple integrals involved. 
M Kay pointed out that ; rf K v denotes the p-th semi-invariant of the distribution 
of x n x (where *„ is the largest observation) and K' p refers similarly to the 

distribution of x n , then Kl = K\ - K 2 . K[ - 1 and K, « K (p > 3 

t^een the'efreme^nH ^ ? &B tabulated distribution of the difference be.- 
tween the extreme and sample mean for n = 2 to n « 9 

Unto certain T CUmstariCe3 ’ accurafce knowledge concerning <r may be avail- 
ab^ as , for example, in using “daily control” tests [4], [18] ihJLSsmX^L 
es “ in some cases with sifficit 55“ 

howeverTD^ c rof the Baff ^\° f ^ w 8 ° ‘ 8 giV611 in referonce PH. 1942; 
distribution of the range in an unpublished Ah e8earo ^ Laboratory also derived the exact 
“ ^published Aberdeen Proving Ground Report (1926). 



TESTING OUTLYING OBSERVATIONS 


30 


data. In general, however, an accurate estimate of <r may not be available and 
it becomes necessary to estimate the population standard deviation from the 
single sample involved or “Studentize” [18], [20] the statistic to he used, thus 
providing a true measure of the risks involved in the significance test advocated 
for testing outlying observations. W. R. Thompson [12] apparently had this very 
point in mind when he devised an exact test in his paper, "On a Criterion for 
the Rejection of Observations and the Distribution of the Ratio of the Deviation 
to the Sample Standard Deviation,” which appeared in 1935. Thompson showed 
that if 



where x = - 2 x,, s~ = - 2 (x, — .?) 2 and x, is an observation selected arfai- 
n <-i n ,_i 

\ 

trarily from a random sample of n items drawn from a normal parent, then the 
probability density function of 

/ = 

Vn - —~l - T 1 

is given by "Student’s” ^-distribution with / => n — 2 degrees of freedom. 

Pearson and Chandra Sekar have given a rather comprehensive study of 
Thompson’s criterion in an interesting and important paper [1] which appeared 
in 1936. They discussed also some very important viewpoints which should bo 
taken into consideration when dealing with the problem of testing outlying 
observations. By setting up alternatives to the null-hypothesis Ho that all items 
in the Bample come from the same population, Pearson and Chandra Sekar point 
out that if only one of the observations actually came from a population with 
divergent mean, then Thompson’s criterion would be very useful, whereas if 
two or more of the observations are truly outlying then the criterion ] — 3 | > 

Tos may be quite ineffective, particularly if the sample contains less than about 
30 or 40 observations 

A point of major interest concerning Thompson’s work nevertheless is that he 
proposed an exact test for the hypothesis that all of the observations came from 
the same normal population. With regard to the use of an arbitrary observation 
in Thompson’s teat, however, it should be borne in mind that the problem of 
finding the probability that an arbitrary observation will bo outlying is different 
from that of finding the probability that a particular observation (the largest, 
for example) will be outlying witli respect to the other n - 1 observations of 
the sample. 

As a final point concerning the paper of Pearson and Chandra Sekar [1], we 
see that for the n values of T, arranged in order of magnitude taking account of 
sign, say 

rjrid) rpW > < 



40 


FRANK ft. (IIU'NHK 


then 


2 i(D 2> ytM ^ 2 ^ ,, , ]> jn+*^ 


The above authors show that the form of the total distribution of a ll (hr T\ 
at its extremes depend only on T M and T' n \ This is ln'mine fur aome cornbsrm- 
tions of sample size and percentage points the algebraic upjw-r limit for T s and 
algebraic lower limit for T <n " J) do not extend into the "f ails’’' of f hfi total dislri- 
bution. Hence, the following probability law holds for T'*’’ when 7' ,,f > the 
algebraic maximum of T m : 


Likewise, 


p{T a) \ - Np(T). 
p{T M ) = Np(T) 


for T (n) < algebraic minimum of T (n l \ Therefore, Pearson and ('hamIra Hvhut 
were able to use Thompson’s table [12] and give, (for wine sample hwi upper 

probability limits for T a) = £ for the. highest observation and tower proha- 

^ . . uj , ___ ^ 

bihLy limits for T ” = ~~ for the lowest observation without actually ol if ain- 

mg the exact probability distribution of T w and T M . Hence, the appearance of 
the table of percentage points on page 318 of their paper [1] was a wibMaiUial 
contribution to the problem of testing outlying oliKorvatinnH since an exact test 
for the significance of a single outlying observation was provided for fhe rase 
where an accurate estimate of <r is not avuilahle. (The ex art distribution of T m 
or T w is derived later in this work.) 

With the above highlights of historical background in mind, we turn now to a 
consideration of the types of problems the experimenter may he* faced with in 
testing “outlying” observations. 


4. Statement of hypotheses in tests of outliers. Once the sample results of 

an experiment are available, the practicing statistician may la; confronted with 

one °r more of the following distinct situations as regards discordant ohserva- 

To b , eg1 ’ 1 ™ th > a vei 'y frequent or perhaps prevalent situation is that 

tie none! f f OT ** Icaat ob »™ ti( >" *" * wmpl,- may have 

the appearance of belonging to a different population than the one from which 

the remaining observations were drawn. Here we. are. confronted with tests for 
and the smallest .. 1 ,., f Cm " m t ^ lli ’ e "be hypothesis thfit both the liirRcat 

T £XIz f rr l ir ly outliorB '' (o) An,,ita 

«ppe«r«“ f t>1 “T ” ‘ llc to ° observations may have the 

Sto wSe.»el„in mmm » M*. 

as not being repmsantatm’S theUg ™1 ;LSr ° hsCrVat - ions 



TESTING OUTLYING OBSERVATIONS 


41 


As to why the discordant observations in a sample may be outliers, this may 
be due to errors of measurement in which case we would naturally want to reject 
or at least “correct” such observations. On the other hand, it may be that the 
population we are sampling is not homogeneous in the uni-modal sense and it 
will consequently be desirable to know this so that we may carry out further 
development work on our product if possible or desirable. 

Although there may be many models for outliers, we believe that an important 
practical case involves the situation where all the observations in the sample 
may be subject to the same standard error, whereas it may happen that the 
largest or smallest observations result from shifts in level. For example, if one 
observation appears unusually high compared to the others in the sample we 
may want to consider the hypothesis that all the observations come from a 
normal parent with mean fx and standard deviation <r as against the alternative 
hypothesis that the largest observation comes from a normal population with 
mean p + \cr (X > 0) and standard deviation a, whereas the remaining observa¬ 
tions are from N(fx, <r). 

Another case involves the situation where the largest and/or smallest obser¬ 
vations may be from N{n, \a), X > 1, whereas the remaining observations of 
the sample are from the normal parent N[n, c) 

Although we have not investigated the power of the tests proposed herein for 
various models, it is believed that the exact test of Section 8 for the largest (or 
smallest) observation and the test of Section 9 for the two largest (or two small¬ 
est) observations possess considerable intuitive appeal for the practical situations 
deseribed above. 3 


6. Distribution of the difference between the extreme and mean in samples 
of n from a normal population. The simultaneous density function of n inde¬ 
pendent observations from a normal parent with zero mean and variance a 
which are arranged in order of magnitude is given by 

n \ r ^ ** ^ *"i 

(1) dF(xi, xi, • •,£„) = ex P dxi dx t ■ ■ • dx„ 

subject to *i < »a < • • < x n 

Since 

£ (*. - *)* = , ($,. “ + S (*. - ^n) 2 

v*I 71 ~~ l L 

where 


x n 



i-i 


* The author is indebted to J. W. Tukey and S. S, Wilks for calling attention to an in¬ 
correct distribution function in the originally submitted manuscript on which several 
yet-to-be proved or disproved statements concerning optimum properties of statistics in 
this paper were based. 



42 


FRANK E, GRUBBS 


then 


£ x] = ni > + »( Xn ~ zy + l - y 2 C-. - *.)* 

fA n — 1 n x, 

n — 2 , , | 3 { Xi + Xt + x*Y 

(2) + —4 (*•-» - + ' * • + 2 V* ” ~ 3 / 


3 

2 / Xi + x,V 

+ l\* " '‘* 2 “V 


where 

1 V~i 

=- 5 X, x,, etc, 

71 — & i-l 

and consequently we find that we are particularly interested in the following 
Helmert orthogonal transformation: 

■%/2 ‘ 1(77)2 = — Sl+Xj, 

\/%-2<n\i — — Xi — Xi + 2x %, 


(3) . 

Vn(n — l)(rijn = — — Xi — Xt ~ x\ — ■ • > — x, 

--Xn-i + (n “ I)x*, 

Vn<T*ln+l - Xl + Xi + Xi + *4 + • • • + Sr + 1 ■ 1 + X B ~1 + X,, 


The above transformation will lead to the distribution of the differenco be¬ 
tween the extreme and sample mean in terms of the unknown population <r for 
samples of n from a normal parent, Since, however, K. E. Nair ( Biomelrika, 
May, 1948) has already published the details independently, we will only re¬ 
cord here for later reference that the density function of t?j , iji, > • • , tj» (after 
integrating i/ B+1 over — » < tj„ + 1 < -(- ») is 

fil r ^ **^ 

(4) dF{m, vt,--’V»)= exp I -- ~ g d Vl dij t ■ ■ ■ di}„ 

where the in are restricted by the relations 


® « > 712 > 0, Vr > Vr-l. 

Upon making the transformations 


( 6 ) 

defining 

(7) 


Vr(r - 1) x r —2 
-(r 


F n (u) = f dF(u„) 

v JO 


= probability Un < u, 


2,3, 


>*)» 



TESTING OUTLYING OBSERVATIONS 


43 


and integrating the u„ over their appropriate ranges we find the cumulative 
probability integrals of the extreme deviation from the sample mean (in terms 
of the population a) for n = 2, 3, • ■ • to be 

F«(u) = 2 V2 dx — jf e -1 * dr, 

a well-knovm result, where for n = 2, x is either the sample standard deviation, 
the difference between the extreme and sample mean, the mean deviation or the 
semi-range. 


( 8 ) 


F„(tt) = 


n \/n C U _J_ e - 4 ((n)/(n-l)*U 
\/n — i “ Q V2ar 


F 


n-l 



dx. 


This is equivalent to the result of MeKay (11), although the derivation in¬ 
dicated is a considerably simpler one. 

Now F„^(u) increases from 0 to 1 as u increases from 0 to ». Hence,' if 

(ft \ ^ 

- - u 1 is practically unity, i.e. for —-—, u numerically large, the 

n — 1 / 7i — 1 

upper percentage points of u„ may be approximated by the normal integral 


(9) 


n 

r r 1 » a 

"I V 

~ VS. 

/ u „ exP L-2n-l U " 

J s/n 

n 

r r f 


" Var . 

/ exn — - 

J \/nl(n—l)un L 

J dt 


.du n 


Formula (9) was found to be particularly useful in checking the higher prob¬ 
abilities in Table II. 

The cumulative distribution functions (8) may be put into another form by 
setting 



r = 2, 3, 


n . 


Then F„(u) becomes 

Vn 




(10) 


... r r 

Jo Jo 

exp [4 S.xrihj]*’ 



44 


PRANK E. OlUntllK 


Define the following functions: 


Hi(x) = 1, 

1 
2 




j 5 ■ 
2-1 


M df, 






r L- exp r 

1 f 1 

i V2? eXP L 

2 n(n - ijj 


a. 


Hence, the probability that the difference between the extreme and the mean 
in samples of n from a normal population is less than m is given by the alterna¬ 
tive forms 


P{u n < uc r) = F n (u) = lI H (nu). 

Of course, H n (nu) —» 1 as u —> «> for any given n. 

In the November 1945 issue of Biomirika, Godwin (13) arrived at a series 
of functions closely related to the H,(x) in connection with tht! distribution of 
the mean deviation in samples of n from a normal parent. In Godwin's work, 
he defines functions G r (x ) which are related to the II,{x) by the equation 

(2 irY^HUx) = <?,(*). 

The G,{x) functions were computed by H. 0. Hartley [15] for r -- 2, 3, • ■ • 9 
only. Computations on the functions F n (u), i.o. (8), were well under way by 
the author before Godwin’s article on the mean deviation appeared. The. Il r (x) 
or (?,(*) can be used to obtain both the distribution of the difference between 
the extreme and mean and also the probability integral of the mean deviation, 
Indeed, it is believed that these functions may have a useful place in tabulating 
distributions of order statistics. 


6. Tabulation of the distribution, function, F n (u). 

The tabulation of the F„(u) with ordinaiy computing equipment is quite 
abonous. However, a table model computing machine was used initially to 

obtain the FJu) for n = 2 to n = 15 using formulae (8) and a numerical quad¬ 
rature process, * 

P0SSible f Mral UBe£ulnm of th0 these functions were 

ENIAC Si"" P , r f eni 0n a h5sh ' 8pecd C0I »puting device, the 

(Electronic Numerical Integrator and Computer) of the Ballistic Re- 

CcSSSSS?SSBat" Resettt b^^ « *.&«0 to the 
due to problems of Sgher pno S 0n “ * ^ f&1! ° f W4B ‘ hoWQV0r > 

until March, 1948. Y ‘ h fUnctlona were not computed on the I4NIAC 



TKHTINO OlTIiVINQ OIIHEKVA.TIONK 


45 


search Laboratories of the Ordnance Department. 4 In this connection, tlie// r (u) 
have been computed for r 2 to r -■ 25 at the Ballistic Research Laboratories. 
For n ~ 2, the functions H r (x) were computed to nine decimal places of ac¬ 
curacy on the KNIAC and at n ■- 25 about five decimal places of accuracy 
were obtained. In 'fable If we have tabulated I>\(u) or II„(nu), i.o, the prob- 


TAHLK III 

Percentage Points for Extreme Minus Mean 


n .1 

m 

U5% 

90% 

99.5% 

2 1 

1.1(53 

1.386 

1.821 

1.985 

3 ; 

1.497 

1.738 

2.215 

2.396 

4 1 

1.096 

1.941 

2.431 

2.618 

5 1 

1.835 

2.080 

2.574 

2.764 

f5 1 

1.939 

2.184 

2.679 

2.870 

7 

2.022 

2.207 

2.701 

2.952 

8 1 

2.091 

2.334 

2.828 

3.019 

<> ; 

2.150 

2.392 

2.884 

3.074 

10 ! 

2.200 

2.441 

2.931 

3.122 

ii ! 

2.245 

2.184 

2.973 

3.163 

12 

2.284 

2.523 

3,010 

3.199 

13 

2.320 

2.557 

3.043 

3.232 

14 

2.352 

2.589 

3.072 

3.201 

15 

2.382 

2.017 

3.099 

3.287 

10 

2.409 

2.(544 

3.124 

3.312 

17 

2.434 

2.608 

3.147 

3.334 

18 

2.458 

2,691 

3.168 

3.355 

19 

2.480 

2.712 

3.188 

3.375 

20 

2.500 

2.732 

3.207 

3.393 

21 

2.519 

2.750 

3.224 

3.409 

22 

2.538 

2.768 

3.240 

3.425 

23 

2.555 

2.784 

3.255 

3.439 

24 

2.571 

2.800 

3.209 

3.453 

25 

2.587 

2.815 

3.282 

3.465 

ability integral of the extreme minus the mean, 

at intervals of u *» 

.05tr. Values 


computed on the table model computing machine agreed to five decimal places 
at n = 15 with values from the ENIAC. Percentage Points of the distribution 
are given in Table III and the moment constants may be found in Table IV. 
Moment constants for n = 60, 100, 200, 500 and 1000 were obtained by use 
of McKay’s formulae [11] (which relate the semi-invariants of x* — $ with 
those of x n ) and Tippetts momenta [5] for the largest observation x n . 



46 


FRANK E. GRUBBS 


TABLE IV 


Moment Constants for Extreme Minus Mean 


n 

Mean 

Std. 

Dev. 


«•* 

2 

.5642 

.4263 

.9953 

3.8092 

3 

.8463 

.4755 

.8296 

3.7135 

4 

1.0294 

.4916 

.7675 

3.6717 

5 

1.1630 

.4974 

.7372 

3.0500 

6 

1.2672 

.4993 

.7165 

3.6511 

7 

1.3522 

.4991 

.7042 

3.6503 

8 

1.4236 

.4979 

.6959 

3.0518 

9 

1.4850 

.4962 

.6900 

3.6546 

10 

1.5388 

.4943 

.6857 

3.0582 

11 

1.5864 

.4923 

.6827 

3.6622 

12 

1.6292 

.4902 

. 0804 

3.6663 

13 

1.6680 

.4881 

.6788 

3.6705 

14 

1.7034 

.4861 

.6777 

3.6740 

15 

1.7359 

.4841 

.6770 

3.6787 

20 

1.867 

.475 

.677 

3.700 

60 

2.319 

.436 

.699 

3.801 

100 

2.508 

.418 

.712 

3.855 

200 

2.746 

.395 

.737 

3.932 

500 

3.037 

.368 

.771 

4.033 

1000 

3.241 

.350 

.794 

4.105 


7. Relation between the distribution of the largest minus the mean of all 
n observations and the largest minus the mean of the remaining n-1 items. 
The following relation is of interest concerning these two statistics: 

Let 


W« — Xn 


Xl + Xi + * " ’ + x n 


n 


Let 


= - l(n - 1) x n - ■ • • - . 


». = *,- * + s+ -•• + «- 


n-1 


= r TZ~i {(» ~ 1) *• ~ ~ ~ ' • • - 


Hence, 


Vn - 


n-1 


% 



TESTING OUTLYING OBSERVATIONS 


47 


P(i\ < k) rr - P n _ j «* < Cj « P*j?i B < n 4|, 

i.e. the probability integral of the largest minus the mean of the other observa¬ 
tions may he obtained by interpolation on the distribution of the largest minus 
the mean of alt n itm« in the sample, 

8. The distribution of Hi/8* and Sj/S*. As indicated in the Summary, we pro¬ 
posed the sample criterion 

ft—* 

C* 2-r (&f « n—1 

T5f *T"-“—“ £ k > * —7 13 *<, 

Z fr, - *)* n “ 1 "- 1 

for testing the mgnifir&nee of the largest observation and the criterion 

Cft E (X, - f.) 1 . * 

y **■ ■«* ~—"™ s. k, 2i •——- z*<> 

• s t(*,-«■ n - 1 '- 

<-i 

for testing whether the srpallest observation i» outlying. We now find the prob¬ 
ability distribution of Pl'A*; hence, also that of Sl/S *.* 

Returning to the density function 


dP(m (^V-l exp 

of Section 5, we make the polar transformation 


'-it/ 

. •" f** 1 


di)t dij» • ■ • d>7 B 


«= r sin 0* sin 0*-» ■ * • sin 0* sin 0«, 

»p » r sin 0* sin 0„_i ■ • • sin 0 4 cos 0j , 

Vi « r sin 0„ sin 0*~t • • - coa 0«, 


« r ein 0* cos 0**u, 
n* ™ r cos 0*. 


Z it* *° Z (Xf — #) 3 r* 

<-* <~r 


R—1 n—1 

Z^-Z (*< “ m r * ain2e " 

<*» J <—I 



48 


FRANK E, GRUBBS 


Hence, 


sin 1 0„ = 


2 (*. - Xn)* 


2 (*, - $)* 

l-l 

The Jacobian of the above transformation is 


r n ~ sin" - 8„ sin”" 0 n _i ■ • ■ sin B t sin 0 S sin 0 4 , 
and since 0 < r < oo 

dF{6 n 0 6 , Bi, e s ) 

— ( 27 r) c ’ l “ 1,/1 3, 2 r (~~2~ ) sin 11-1 0„ ■ ■ • sin 1 0 4 sin 0 4 cl 

Since the restrictions on the !),• are 


Cn—3)/2 r sin"- 5 Q n ■ • • sin 1 0, sin 0 t dti n ■ ■ ■ dfk dB t dB,. 


^°> \Z~ 2 lr>V'- l, r>3, 


we have 


tan 0 n cos 0„-i = . 


n > 4, 


tan 0„ < 


see 0n—11 


n > 4, 


0 < 0, < - . 
_ - 3 


Thns, letting K n = ~fi=m V , we see that 

v f W3 f* 1 r*f»—2 

( 3) M 1 " ■ io I 8inn_S «- • • ■ sin 2 05 Bin 0 4 d0„ .. • dO t d» t » 1, 


we see that 


where l r = tan 1 y L±i sec e r . 

Upon reversing the order of integration (the variable limits 
get for n = 3 


are monotonic) we 


so that 


-T/3 

K* J o d6„ = 1 , 

m < 0) = K 3 Jf de 3 0 < 8 < M 3 = tan- 1 yTI. 



Tf STINT! OIITI/YING observations 


49 


When 7i - 1, we obtain 
/■■'i f *n 


where 
m r » tan' 
so that 


„ p r n . r*' r n 

l i l J Km (,i + K* / / Hill Bf ddidOi - 1 

* Li 

y r „ 2’ ” tfin 1 Vr(r — 2) and L r * sec -1 ^/ r 


■ tan 0 r , 


(15a) P{9, < 9) - jf sin 0 4 d&< when 0 < 0 < nu = tan -1 i/- 


and 


(15b) P(0 t < 9) • ~ J sin 0\ d9 t -f- K\ f f Bin 0 4 dO s dO 4 


when in 4 =a tan 


« p/3 

W 4 J t A 

"Vi- 


0 <M t — tan 1 V4-2. 


When 7i 5, wo get, 

r m * r '/* 


P p _ r m t .M, r l J 

M J 0 J 0 n,a °> sm 0 4 c[0, ^ cWt-h K, J'' I* sin 1 B t Bin OidOidOt dO t 

/•Mt .Mi .rhl 

+ Kt / / / Bin 2 0 4 sin 0 4 d0 a dti, dO t = 1 

(where L, * see 1 ^/| tan 0 4 is to bo taken as 0 whenever 0 t < vu, •» 


so that 


tan- y^) 

(10a) P(0 6 < g) ta ^ ^ sin J 0 S c (04 when 0 < 9 < vu - tan" 1 / | / /| 


and 


fc** r w > *0 r Mi rtir/3 

(1Gb) / J (0 6 < 0) si —5 sin 2 0 S d$ t + Kt J J^ sin 2 0* sin 0 4 dd } dOi d0» 

where ra 6 = tan" 1 ^/| < 0 < M t *» tan" 1 ’ \/5^3, 

; tan 0 4 = 0 whenever 0 4 < Tin = tan" 1 ,4 /- 

y 2 ’ 


and we put L 4 = sec 


. /2 



50 


FRANK E. GRUBBS 


For a sample of n items 

r / n - l \ 

;i7a) v r O~) 


- B (2-T-2, i) when 0 £ » £ Un-'^/ n -; 

n , fn — 2* l\ 

P(0« < 6 ) = - 7,/(j(«-*i)) ^—2— ’ 2/ 


4 . K n f I I • • • / sin" - 0 * • •• am 5* d»i d0 t ■ ■ ■ <1$„ 

Jmn " Im—l 


m„ = tan" 1 y ~zr^ ^ 6 ^ ^ “ tan^VnCn - 2) 

where /,(p, </) is K. Pearson’s Incomplete Beta Function Ratio (19). It is to be 
understood in (17) that 

L{.=* sec" 1 v - 2 tan 0< for t ■ 4, 6, • * • , n — 1 


is to be taken as zero when 0< < tan" 1 y 
Percentage points for the sample statistic 


t - 2‘ 


sin Bn = g = 


Z (*l ~ *»)* 

O, <-l 


Z (ft “ £) 

i-i 


or the statistic S\/S 2 are given in Table I and were obtained by inverse inter¬ 
polation on the tabulation of the probability integral (17) above. Percentage 

points for the Pearson and Chandra Sekar statistics, T n » or 


x — Xi 


(where s ! = ^E(j|- £) s ), are given in Table IA. Tim statistics 


ft/<S 2 and T„ are related by the formula 


1 - JL 

S 1 n — 1 * 


* It has been noted that (17a) gives a good approximation to (17b) when fi ;> tan y — 
provided we are interested in the important practical region P < .10, at least f or n < 26. 



TEfmNO OOTMTTNO OBSERVATIONS 


51 


The MatiMic T„ <*»r T t j if* easier to compute than S\/& (or ,S'?/A' 2 ). The tabula¬ 
tion of the multiple* integral (17) was carried out on the Bell Relay Computers 
at the Ballistic Ite.-careh Laboratories. 

9. The distribution of K; S’ and B?, } /S 2 . As indicated in the Summary, 
the* proposed criterion for judging the significance of the two largest observa¬ 
tions is 


E (*i ~ 


‘ s ’« i.» iS " - . .. < ft where = —— 23*., 

E (A - i) 3 n ~ 2 '- 1 


V —1 


- • < k where £ 1,3 = —~ ^3 A • 

' M _ O * 


and that fur testing the two smallest observations is 

E (A ™ i.,s) s 

~j * 

E (A ~ JB) 

From the preceding section, we note that 

ti n—fl 

Ei? 

1—2 


n - 2 t=* 


x ^ a t v* 2 2 * 2 -s . 2 „ 

jL, Vt 5a t , <L 7< “ r sin e n Bln 0 s -i. 


1-5 


Hence, 

(18) 


E (a - *.-1. «)* 

ain s 0« sin*fl«_i ~ -~ l ~ -, 

E (a - *) a 

1-1 


so that if we find the distribution of 

sin 1 8„ sin ! 6 n ~i = sin* A„, say, 

then we have the distribution of 5*-i,«/S s and hence also that of Sl, t /S i , i,e. 
(19) P{sin* A n < A:} - P[A„ < sin -1 Vk \. 

Returning to the multiple integral (13), let 


sin A, » sin 8„ sin 8n-i , 


A< « 0<, 3 <, i <1 n — 1- 

The Jacobian of this transformation is given by 

, Oa) COS An 


en, 


d(A«, ■ • •, Aj) Vein 2 A„_i — sin* A* 
The limits of integration for A„ are given by 

Vn sin An —1 


0 < A„ <, sin 


-1 


V%Ln - 



52 


FRANK E. GRUBBS 


and, of course, those for A„_i, • • , As are the same ftS the limits for 6 n ~i , 
respectively. Hence, substituting in (13), we obtain 


.tuo" 1 vTHjmA 1 

K n 

Jo -'o 


1 


tan™^n—1/ n—Sii&cAw 


( 20 ) 


pBin" 

Jo 




\/2(n—l) — ( n—3)a 


Bin " -5 A n ein’’” 4 A„_i 


ein s Aj sin A ( cos A* dA„ 


iiAj 


sm"' 3 A n _i Vsih® A„_i -- .-in- A, 


-« 1 , 


Reversing the order of integration, we have 
K n 


( 21 ) 


, _,-v/ n(n-S) . -- 

c f* n v r * ~ )( n -2) I 

0 - 1 V»(n-t).tnA,« 


**13 

txa&t 


\/ n4-(n—2)*ln*A„ 

sin" -3 A n sin n ~ , A B - 1 • ■ > sin A, cos A* dA a • ■ ■ dA* 
sin’ 1 3 A„_i Vain 5 A„_i — sin* A„ 


^for A, < tan 1 ^~{~2’ ^ len 8001 /|/^ v-^ tan A, is to be put equal to 


zero where i 


( 22 ) 


> 4 ) 

n *ll 


so that for n = 4, 

■A .rl> 


sin A ( cos A 4 dAj ciA ( 


=== sin A a Vi sin 11 A a — sin 1 At 


where 0 < A < sin 1 y^, 
and for n = 5, 

fw, sAi-i, [‘ 

Jo 

(23) 


■%A+»in*A| 




V^*2ain Aj 


\/ 15+3Tln J A ( 
»/3 


where 0 < A < sin ’ y^, etc. 


-/ 


sin* A t co s A t dA, dA< dA 5 
.in-‘V2irtaoA t sin A4 a/ sin 5 A* — sin 1 A* 


Wc remark that an obvious extension of the above principles should lead to 
the distributions of 

Sn-i.n-l.n/tS 3 Mid Sl.J.j/S* , 

'Sn-j,n- 2 .n-i,n/*S ;! and 5?,J,3,4/iS J , 

etc. although the tabulation of such probability integrals may be exceedingly 
difficult. 

The problem of tabulating the probability integral (21) involves a double 
quadrature process and has been carried out on the Bell Relay Computers at 
the Ballistic Research Laboratories for n = 4 to n = 20, inclusive. Table Y 
gives some useful percentage points for these sample sizes. 



TESTING OUTLYING OBSERVATIONS 


5 S 


TABLE V 


Table oj Percentage Points for —dl 1 

S 3 


n i 

i 

| i% 

25% 

5% 

10% 

! 

4 

! .oooo 

.0002 

.0008 

.0031 

5 

. 0035 

.0090 

.0183 

.0376 

6 

.0180 

.0349 

.0565 

.0921 

7 

. 0440 

.0708 

.1020 

.1479 

8 

.0750 

.1101 

.1478 

.1994 

9 

. 1082 

.1492 

.1909 

.2454 

10 

.1415 

.1865 

.2305 

.2863 

11 

. 1730 

,2212 

.2666 

.3226 

12 

.2044 

.2536 

.2996 

.3552 

13 

.2333 

.2836 

.3295 

.3843 

14 

.2605 

.3112 

.3568 

.4106 

15 

.2869 

.3367 

.3818 

.4345 

16 

.3098 

.3603 

.4048 

.4562 

17 

.3321 

.3822 

.4259 

.4761 

18 

.3530 

.4025 

.4455 

.4944 

19 

.3725 

.4214 

4630 

.5113 

20 

.3900 

.4391 

.4804 

.5269 


s 2 « E (*< - 

2) 5 where £ ~ 

1 n 
-E*. 





n <-i 



ft-a 


1 n ~ 2 


O* 

Oft 

-I.n 2:2 

■ f where 

5ft—Un o ^ 

n — 2 



n 1 

Sjf.* == E (*i — Si.*) 1 where £i,j = -—~ E z. 
i—% n — £ .-I 

10. Comment on the distribution of Sl.^/S*. In connection with the distribu¬ 
tion of the statistic 


Sin 

*9" 


n~-l 


2^ O&t ®l,u) j «—1 

'=1---, where £, lH *= --^ ID *<, 

E (*i - *)* n " 2< "’ 


for testing simultaneously whether the smallest and largest observations are 
outlying, an investigation indicates that Bince 


E + 


n 


1 

+ 


(*» - «) s + 


1 / .^t i _ 2 c \a 


(*i - £ n f + 


71 — 2 

+§(« - *+ « y + 2 (x, 


— 3 


(&»—1 ^ 1 ,*) 




FRANK E. GRUBBS 


54 


then the transformation 

\/ 2 -lVi = — Xi + Xt, 

= — Xi — xj + 2^ , 
\/4'3u 4 — —■ 213 — Ea Ei 4 “ 3 xf, 


(24) 


V(w - 2)(n - 3)u„_ a = - x t - x, - ■ ■ • - *»-i + (« - 3)*-. 1 » 

V(n - 1 )(n - 2)v n -i = - (n - 2)*,. -f *1 + ** + • • • + *«.-1 , 

■\/n[n — l)r n = — — #2 — 23 — ■ ‘' 2 B ™i 4 - (n 1)-Th » 

Vn^+i = Ei •+- 2:2 -f • • • + x„, 

followed by transformations of the type (11) and that of Section 0 may lead 
to the distribution of S\,JS\ However, the limits of integration do not turn 
out to be functions of single variables and the task of computing the, resulting 
multiple integral may be rather difficult. 


11. Examples on testing outlying observations for rejection. Wo now turn 
to the problem of applying our theory to particular practical examples of data 
which appear to have outlying observations. Apparently, in the following ex¬ 
amples there were not sufficient practical or experimental grounds to reject 
the suspected outliers and hence some statistical judgement became necessary 
either to support retaining the “outliers" in the sample or leave little doubt 
that certain of the observations should be questioned. 

Example 1. Our first example has almost become a classical one as Irwin 
[3], Rider [2], and other writers on the subject including Chauvenet, Peirce, 
Gould, etc. (see Rider’s survey [2]) all refer to it, applying their various tests. 
The example consists of a sample of 15 observations of the vertical semi-di¬ 
ameters of Venus made by Lieut. Herndon in 1846 and is given in William 
Chauvenet’s, A Manual of Spherical and Practical Astronomy, II (5th ed., 
1876), p 562. The individual residuals or deviations from the mean are*. 


-0.30" 

0.48 

0.63 

-0.22 

0.18 

-0.44 

-0.24 

-0.13 

-0.05 

0.39 

1.01 

0.06 

-1.40 

0.20 

0.10 

s observations 

in increasing order 

of magnitude, we 

-1.40" 

-0.24 

-0.05 

0.18 

0.48 

-0.44 

-0.22 

0.06 

0.20 

0.63 

-0.30 

-0.13 

0.10 

0.39 

1.01 



testing omunm observations 


55 


and it is w an that two of the residuals, —1.40 and 1.01, appear to be outliers. 
Rider [2) indicates that the above observations have been referred to by previ¬ 
ous writers oh "residuals"; nevertheless their sum iB 0.27, bo that the sample 
mean, x -- .018. 1/4 us apply the exact test, i.e. 7\ of Pearson and Chandra 
Sekur or sis developed in Section 8 for a single outlier to the least obser¬ 
vation, — 1.40. We find Xi * —1.40, £ = .018 and ,v - .532 (alternatively, we 
find ft* *-' 4.2496 using all 15 observations and *S’i «= 2.0953 which is based on 
14 observations, the suspected outlier —1,40 not lasing included). Further, 

ft « * “ *‘ *» l 018 t. L4 ° * 2.665 (or S\/i S* - 0.4931) and from Table IA 
.1 ..wZ 

(or Table I) we see that 0.01 < P < 0.025 so that we would reject the observa¬ 
tion — 1.40 when using the. 5% level of significance. Having rejected —1.40, 
we now have left a sample of 14 observations and test the greatest one, i.e. 1.01. 
For T n based on the remaining 14 observations, we have n = 14, x n = 1.01, 
£ = .119 and « ■*» .387 (alternatively, for the new sums of squares, we find 
iS* « 1.2400 leaving out 1.01 and S’ ~ 2.0953 including the observation 1.01). 

Hence, ft, - ------ - * 2.302 (or Si/fi 4 « 0.5922) and from 

S .38/ 

Table IA (or I), we find P slightly less than .10, so that we decide to retain the 
observation 1.01. 

It would have been interesting nevertheless to see whether or not the test 
Sl.Jtf would have rejected simultaneously the observations —1.40 and 1.01 
if percentage points for the distribution of this statistic were available. 

It is of interest to remark that for this particular example Irwin [3, page 
245], using the difference between the first two individuals divided by an esti¬ 


mate of tr, i.e. -— 1 , concluded also that —1.40 but not 1.01 should be re- 

a 

jected. In testing both of these observations, Irwin used the single biased esti¬ 
mate for v, 



(assuming 5 = 0), 


based on all 15 observations. It is a mere coincidence, of course, that for this 
example Irwin’s test gives the same result as the exact test or the test based 
on the ratio S[/S?. In this connection, Irwin rightly calls attention to the fact 
that in dealing with a sample of only 15 observations the standard deviation of 
the Bample is a very unreliable estimate of the population standard deviation. 

£ M 3% 

It is remarked that here we would, of course, hesitate to apply the test —- 

<r 

to the observation —1.40 as we do not have available and accurate estimate of 
<r from past data. 

Example 2. The following ranges (horizontal distances from gun muzzle to 
point of impact) were obtained in firing projectiles from a weapon at a constant 
angle of elevation and at the same weight of charge of propellant powder: 



56 


FRANK E. GRUBBS 


Distances in yards 


4782 

4420 

4838 

4803 

4765 

4730 

4549 

4833 


It is desired to know whether the projectiles exhibit uniformity in ballistic 
behavior or if some of the ranges, such as 4549 and 4420, are not consistent 
with the others. 

Arr ang ing the distances or ranges in increasing order of magnitude, 


4420 

4782 

4549 

4803 

4730 

4833 

4765 

4838 


we suspect the presence of two outliers, i.e. 4420 and 4549. Having no available 
knowledge of a from past data for this example, an intuitively efficient test to 
apply would be that of Section 9, i.e S\,i/S l . 

We find 


«2 12 {xi - $1,2) Z 

Ol.J <-3 

s T, (*. - if 

-1 


.054 


which is significant at the .0,1 level (Table V) and consequently we would judge 
the distances 4420 and 4549 yds. as being unusually low. 

As a matter of interest and as a recommended temporary practical expedient 
for testing several “outliers”, consider for example the last seven of the above 
ordered observations, 


4549 

4803 

4730 

4833 

4765 

4838 

4782 



and apply the exact test, Sl/tf, to the smallest observation, 4549. We find 
Si/S 2 = .145 so that .01 < P < .025 from Table I and we should thus reject 
4549 from the sample of seven. Moreover, we should now surely reject 4420 
as being outlying, arriving at the same result we had for the test £*,,/£*. Thus, 
as a general temporary expedient in testing for “outliers” one could rank the 
observations, and apply the tests Sl/S 2 (or Sl/S*) and Sl.-Jtf (or Sl.jtf), 
thus working from the “inside” observations of the ranked sample in order to 
establish consistency of the observations. 



TESTING OUTLYING OBSERVATIONS 


57 


12. Additional comments. Although we have used a significance level of .05 
in the examples, it may be, preferable from a practical viewpoint to reject outly¬ 
ing observations only at a lower level, such as .01 or .005. 

Extensions of the ideas for testing outlying observations presented in this 
paper may lead to efficient sample criteria for testing the significance of various 
numbers of high, low, or simultaneously high and low sample values. However, 
the mathematical details would probably be complicated. In this connection, 
it is remarked nevertheless that the advent of high-speed computing devices 
may have considerable hearing on establishing experimentally any probability 
distribution. That is to say high-speed electronic computing devices could prob¬ 
ably be programmed to generate random numbers with frequencies equal to those 
of the normal (or any other) distribution, to compute various functions (such as 
ratios in this paper) of sample values, etc., and establish frequency distributions 
to a desired order of accuracy. 

13. Acknowledgement. The author is greatly indebted to Prof. C. C. Craig 
under whose most competent guidance this work was carried out. Indeed, the 
stimulus and encouragement received by the author as a result of Prof. Craig’s 
interest in the problem were paramount in orienting various phases of the sub¬ 
ject matter and in accomplishing the results given herein. In connection with 
the computing, debts of gratitude arc owed to several members of the staff 
of the Ballistic Research Laboratories. It is desired to express appreciation to 
Col. Leslie E. Simon, Director of the Ballistic Research Laboratories, for recog¬ 
nizing the importance and desirability of carrying out the computations on 
high-speed computing devices. In this connection, appreciation is also expressed 
to Dr. L. S. Dederick, Chief, Computing Laboratory, Ballistic Research Lab¬ 
oratories. The programming of the H n (u) functions on the Electronic Numerical 
Integrator and Computor was done by Dr. Derrick Lehmer of the University 
of California, who was with the Computing Laboratory, BRL, during the latter 
part of World War II and Miss Ruth Lichterman. The author is particularly 
indebted to Dr. Franz Alt, Dr. Bernard Dimsdale, Miss R. Lichterman, Mr. 
John Holbcrton, Miss H. Marks, Mr. F. Spence, and others of the Computing 
Laboratory during the period the functions H n (x) were computed on 
the ENIAC. The computing of the distribution of the statistics S\/S 2 
and S'n-i.n//? 2 was done on tho Bell Relay Computors under the direction of 
Mr. J. 0. Harrison, Mrs. M. Masincup and Mr. E. Cushen. The author also 
desires to express appreciation to Miss Helen J. Coon for considerable compu¬ 
tation and checking carried out on a table-model computing machine. 

REFERENCES 

[1] E. S. Pearson and C, Ciiandra Sekar, “The efficiency of statistical tools and a 

criterion for tho rejection of outlying observations”, Biomelnka, Vol. 28 (1936), 

PI> 308-320. 

[2] P R. Rides, “Criteria for rejection of observations”, Washington University Stud¬ 

ies —New Senes, Science and Technology—No. 8, St. Louis (1933). 



58 


FRANK E. GRUBBS 


[3] J. 0. Irwin, “On a criterion for the rejection of outlying observations", Biometrika, 

Vol, 17 (1925), pp, 238-250. 

[4] “Student", Biomelnka , Voi. 19 (1927), pp 15HG4, 

[5] L, H, C. Tippett, “The extreme individuals and the range of samples taken from a 

normal population", Biometrika , Vol. 27 (1925), pp. 151-404. 

(BJ E S. Pearson, “A further note on the distribution of range in samples taken from a 
normal population”, Biometrika , Vol. 18 (1920), pp. 173-494, 

[7] Tables for Statisticians and Biometricians, Part II, edited by Karl Pearson, pp, GX- 

CXIX. 

[8] E. S. Pearson, “The percentage limits for the distribution of range in samples from 

a normal population", Biometrika, Vol, 24 (1932), pp. 404-417. 

[9] H. 0. Hartley, “The range in random samples", Biometrika, Vol. 32 (1942), pp 334- 

348. 

[10] E. S Pearson and H. 0. Hartley, “The probability integral of the range in sample# 

of n observations from a normal population", Biometrika, Vol. 32 (1912), pp. 301- 
310 

[11] A, T. McKay, "The distribution of the difference between, the extreme observation 

and the sample mean in samples of n from a normal universe", Itiomdnka, Vol 
27 (1935), pp. 466-471, 

[12] W R. Thompson, “On a criterion for the rejection of observations and the distribu¬ 

tion of the ratio of the deviation to the sample standard deviation", Annals of 
Mall Slat , Vol. 6 (1935), pp, 214-219 

[13] H. J Godwin, “On the distribution of the estimate of mean deviation obtained from 

samples from a normal population", Biometrika, Vol. 33 (1945), pp, 254-250, 

[14] W.P A, Tables of Probability Functions, Vole. I and II, New York, N.Y (1912) 

[15] H. O. Hartley, Note on the calculation of the distribution of the estimate of moan 

deviation in normal samples", Biometrika, Vol. 33 (1945), pp, 257-258. 

[16] “Tables of the probability integral of the mean deviation in normal samples 1 Bio- 
1 metrika, Vol, 33 (1946), pp. 259-265, 

[17] J, Neyman and E S. Pearson, “On the problem of the most efficient tests of staffs* 

. . „ ■*““ hyp°^< PhiL Trans ' Roy. Soc (London), Vol. 231 (1933), pp. 289-337. 

[18] L b Pearson and H, 0. Hartley, "Tables of the probability integral of the stu- 

dentued range,", Biometrika, Vol, 33 (1943), pp, 89-99. 

[19] K Pearson, Tables of the Incomplete Beta-Function, published by the Biometrika 

Office, University College, London, 

[20] K. R Nair, “The distribution of the extreme deviate from the sample mean and tin 

studentized form," Biometrika , Vol. 35 (1948), pp, 11H44, 



Page 
.. 59 
. 60 
.. 62 
63 


DISTRIBUTION OP THE CIRCULAR SERIAL CORRELATION 
COEFFICIENT FOR RESIDUALS FROM A FITTED FOURIER 

SERIES 1 ' 2 

By R. L. Anderson and T. W. Anderson* 

North Carolina Stale College and Columbia University 

CONTENTS 

Summary. ... ... . .... 

1. Introduction. . 

2. The use of fitted Fourier series ... . 

3. Tables of significance points of R .. . 

3.1 Significance points of R usings seasonal trend for annual, semi-annual, bimonthly, 

and monthly data... .63 

3.2 Significance points of R for other single-period trends. .....66 

3.3 Example of use of significance points ., 67 

4. Testing the hypothesis of lack of serial correlation . 66 

4 1 Statement of the problem. 69 

4 2 Preliminary transformations. . . .70 

4.3 The likelihood ratio criterion. 71 

6, The exact distribution of jjt . 78 

5.1 Introduction.73 

5.2 Some special distributions of « R .. , .. 74 

6.3 Some special distributions of lR for L > 1.76 

6.4 The exact distribution of xJt when p ^ 0. 76 

6. Momenta . , .... . ,. , 77 

6.1 The exact momenta of R . . ..77 

6.2 Approximate moments of R when p =• 0. .77 

6.3 Approximate moment generating function of C and V when p 0. 78 

7, Approximate distributions of R . 78 

7.1 The Pearson Typo I (Incomplete Beta) distribution. 78 

7.2 The normal approximation. 80 

References . 80 


Summary. In this paper the observations are considered to be normally dis¬ 
tributed with constant variance and means consisting of linear combinations 
of certain trigonometric functions. The likelihood ratio criterion for testing the 
independence of the observations against the alternatives of circular serial cor¬ 
relation of a given lag is found to be a function of the circular serial correlation 
coefficient for residuals from the fitted Fourier scries (Section 4). The exact dis¬ 
tribution (Section 5), the moments (Section 0), and approximate distributions 

1 Includod In Cowles Commission Papers, Now Series, No. 42. 

1 Presented to the meeting of the Institute of Mathematical Statistics at New York, 
December 30,1947. 

•Fellow of the John. Simon Guggenheim Memorial Foundation; Research Consultant 
of the Cowles Commission for Research in Economics. 

59 
































60 


B. L. ANDERSON AND T. TV. ANDERSON* 


(Section 7) are given for the cases of greatest interest. From these results sig¬ 
nificance levels have been found (Section 3). The use of these levels is indicated 
(Section 2), and an example of their use is given (Section 3). 


1. Introduction. Two mathematical models have been used extensively m 
time-series analysis. In one model the observation in the sum of a “systematic 
part” and a random error. The cyclical properties of this model result from the 
cyclical properties of the systematic part, which is usually taken to ho a short 
Fourier series. The stochastic element is superimposed on the non-stochastic 
part, and the error at one time point does not affect a later observation. Tin* other 
model is the stochastic difference equation or “autoregressive model.” An ob¬ 
servation is the sum of a linear function of previous observations and a random 
element. The cyclical properties follow from the properties of the difference 
equation (i e., the linear combination of observations), but are disturbed by tin* 
random disturbance that is integrated into the system. A more general model 
can be constructed that includes both of the two mentioned. The observation 
can be taken as a linear combination of past observations and Fourier terms plus 
a random element. 

In this paper, the linear combination will be only a multiple, of some preceding 
observation. For lag I, the model is of the form 

(!) “ Mi = p(s.'-i - M(-i) + «, i ~ 1, 2 ■ ■ , ,Y, 

where x a = x„ and na = p,„ . In (1), the (a:,) are the N observations; the {«,) 
are N random disturbances, each assumed normally and independently dis¬ 
tributed with zero mean and variance cr 2 ; the means {ml arc linear combinations 

of some of the N functions of i: cos ^ and sin For odd, g *» 0, 1, • • *, 

!(1V — 1); h = 1, , - i_). For N even, g = 0, 1, • • •, h » 1 • • • 

iJV - 1. Hence, 


( 2 ) 


a. = L <v cos + Z ft, sib.-- 

<>' tv \i I y 


2xih' 


where p'and h' run over certain values of the ranges of g and h, respectively, 
et K be the number of terms m (2). Usuallytheconstant term, a,, is included 

(in this case p = 0 and cos ^ = 1). Of the N trigonometric functions available, 

rindld^H^S ar lu 8U S ly Ch ° Sei ? 80 that tcrm8with certain periods are in- 

P “ aroexclu ' 1 « d - 11 »>>“■>'* b* that (1) 

T ^am pl e ..timates of and ft, ar. the neoal region. of *, on 

COS T and sin respectively. Because of the orthogonality of these trig¬ 
onometric terms, the estimates aTe 



CIRCULAR SERIAL CORRELATION COEFFICIENT 


61 


(3) 


Ugt 




N 

22 «, cos 

1"»1 

AT 

22 si 11 

<—i 


2t rig' /N 
N / 2 ’ 

2irtV /W 

N / 2 ’ 


ff ^ 0, JAT, 



aty 


= 22 


Xi COS Trt 


i/tf = E (~ 1)'*./A r - 


The fitted series is 

(4) m, = 2 <V cos + 22 h> sin . 

g f iV A' iv 

where the sums on g' and h' are over the ranges in (2). 

The serial correlation coefficient suitable for this model is 

y 

22 (a. - m>) (*.-i - m,_0 

( 5 ) R = ^- jf -, 

22 (sv - m,) 2 

i — i 

where m a sa m n . This statistic can be used to estimate p, or it can be used to 
test hypotheses about p. In fact, for the circular model this statistic leads to the 
best tests [3]. 

It is hoped that the mathematical model studied in this paper can be used in 
the treatment of certain problems in economic time series. For example, the 
seasonal variation in a series of data may be considered as a “systematic part” 
made up of trigonometric components. In the next section we discuss in a more 
detailed way how the use of this model may arise in the field of economics. 

We have considered circular serial correlation, although in most statistical 
problems it is non-circular serial correlation that is involved. The reason for 
treating the circular case is the inherent mathematical simplicity. The circular 
coefficient and Fourier series of the type (2) are “naturally" related. The relevant 
fact is that the vectors 



are characteristic vectors of the matrix of the quadratic form in (*,■ — m,) of 
the numerator of R. For this reason the distribution and significance points 
are easily obtained. 

In the usual applications the circular coefficient can be used even if the hypoth¬ 
esis alternative to independence of observations is non-circular serial correla- 



62 


R. L. ANDERSON AND T. TV. ANDERSON 


tion. The circular coefficient may not have as good power against non-circular 
alternatives as non-circular coefficients, such as 

y 

J2 (®* ~ m <) (*i-i — tn.-i) 

(6) --y-■ 

72 (*< - «i,)* 

t-1 

However, the difference between these two statistics is {x,\ — m») (r N m H )/ 
2(®i - m,) 8 , and it can be shown that this converges stochastically to zero (m 
N increases and p remains fixed). 


2. The use of fitted Fourier series. A linear combination of trigonometric 
terms may be used as a regression function when there is a “systematic part’’ 
(or “trend”) that is periodic. Fot instance, it may be reasonable to assume that 
a series of agricultural data has a systematic component with certain periodicities 
due to variation in weather. Then one may ask whether this regression function 
“explains” all of the interrelations in the series. 

An example taken from agricultural economics ib a development of that given 
by Koopmans [8]. Suppose p, and q, are the price and supply, respectively, of a 
given farm product at time t. Let Q (d} be the quantity demanded at time t if 
p, = P, and Q< 4) be the quantity supplied at time t if p,..t. ® P, where P is an 
arbitrarily selected point of reference on the price scale, serving to define the 
Q’s. Let the market equations be defined as follows: 

^ Pt — P = — Q?") -f- Uf, 

W <z< - QP = «(pi-l - P) + , 

where u and v are random disturbances. The first equation expresses the price 
depressing tendency of an abnormally large supply; the second expresses the 
supply-stimulating influence of abnormally high prices L time units earlier (the 
time between planning the product and selling it). We can substitute from (7) 
at time (t - L) into (8) and obtain 


(9) 


2« QS ,J - p{<It-L — Qi-t) + W (, 


which is ofthejorm (1) for general lag L (i - 1 j 8 replaced by l - L) if 
„ . “ * ~ Now we “ay wish to test the null hypothesis, 

telttL We assum ® that ° ur alternative hypothesis is //, : p > 0, we can 

ltlvfor W ^ S 6810 7 ° f ihe positive tail of th0 distribution of R. Simi- 

; ./<°’. we Woulduse the ne 8ative tail of the distribution of R . In 
other cases, if we believe p j* 0, we might wish to estimate p, 

variation t0 con8ider us “g the Fourier series for seasonal 

appropriate ^ given below with “dications of the 

AnnZuata ^l i S Ce P ° ints for test “g the hypothesis p - 0. (a) 
Annual data. Here only a constant is fitted; this is the sample mean. The tabU 



CIRCULAR SERIAL CORRELATION COEFFICIENT 


63 


given in [2] or [5] are to be used, (b) Semi-annual data, To “correct” for varia¬ 
tion of period two we fit a constant and cos vt - ( —1)‘. The table given in Sec¬ 
tion 3 for P ~ 2 is to be used, (c) Quarterly data. The four terms to be fitted are 

1, cos rl - (—1)', cos ", and sin —■. The table given in Section 3 for P = 2 

and 4 is to be used, (d) Bimonthly data. The six terms to be fitted are 1, cos r l, cos 

* 2^^ wt '7r|J 

-g , sin g , cob — , and sin . The table given in Section 3 for P = 2, 3, and 


vt 

6 is to be used, (o) Monthly data. The twelve terms to be fitted are 1, cos 


. xt vt 
sin cos ■ 


7rt 


(~~iy. The table given in Section 3 for P = 2, 12/5, 3, 4, 6, and 12 is to be used. 
It is assumed here that the data are given for each time interval in a certain 
number of years. Then the residuals are the same as the residuals taken from 
means for each month or season. That is, if the data are monthly, one may com¬ 
pute the sample means for January, February, etc., and residuals are to be taken 
from the corresponding monthly means. The fitted Fourier coefficients are cer¬ 
tain linear functions of these means. 


3, Tables of significance points of R. 

3.1. Signifieatue points of R using a seasonal trend for annual, semi-annual, bi¬ 
monthly, and monthly data. The calculations of significance points of R (lag 1 
only) have been subdivided according to the number of terms included in the 
estimating equations, mi. The significance points for only a constant in «,• have 
been tabulated in [2] and [5]. Since the main use for tni equations involving sine 
and cosine terms Beems to be for semi-annual, quarterly, bimonthly, and monthly 
data, for which N is even, the results presented in this paper are for N 
even. Then we will have all of the sine and cosine terms in pairs except for cos 
ri = ( — 1) J and the constant term. We shall find it convenient to refer to the 
period *■ N/g' or = N/h' of the terms in (2). 

We have calculated significance points R' exact to 3 decimal places, 
for Pr[R > R'j <■ a « .01, .05, .95, and .99. The values of R' correspond¬ 
ing to a **> .01 and .05 aro usually indicated as the positive significance points 
and those corresponding to a •* .95 and .99, the negative significance points. In 
all of these cases, except for annual data, the distribution of R is symmetrical. 
Hence only the positive significance points need be given, since the negative 
points are simply the corresponding positive points with opposite sign; that is, 
R' (.95) - ~R' (.05), R' (.99) - -R’ (.01). 

The significance points were calculated from the exact distribution of R 
given in Section 5 for all N up to the values where the approximate significance 
points using an Incomplete Beta distribution (Section 7) were the same as the 
exact significance points, The Incomplete Beta significance points were used 



64 


R. L. andeebon and t. xv. andkhron 


up to the value of N for which a normal approximation waa satisfactory. For 
aome of the results, the normal points became sufficiently accurate to he used 
following the exact points. 

The values of R' are given in Table 1 except for (a), for the following values 

of N: 

(a) Annual data —see the tables in [2] or [5]. 

(b) Semi-annual data (P = 2): IV - 6(2)00. The exact points were needed 
for N through 10(a = .05) and N through 22 (a = .01). The normal points 
could be used for N = 60 (a = .05) but were still too large by .(Kill for N 00 
(a = .01) 

(c) Quarterly data (P = 2, 4): N = 8(4)100. The exact points were needed 
for IV through 20 (a = .05) and JV through 32 (a - .01). The, normal points 
were adequate for all N above 20 (a = .05) but were Btill too large by .001 for 
N = 100 (a - .01). 

(d) Bimonthly data (P = 2, 3, 6): A - 12(0)150. The exact points were needed 
for JV through 24 (a = .05) and N through 30 (« = .01). Again the normal 
points were adequate for all N above 24 (a = .05) but were st ill ton large by .0005 
for JV = 150 (ol = .01). 

(e) Monthly data (P = 2, 12/5, 3, 4, 6, 12): A = 24(12)300, The exnet points 
were needed for JV = 24 (a - .05) and N ~ 24, 3(5 (a -= .01) 'The normal points 
were adequate for N > 24 (a = .05) and N > 300 (a = .01). 4 

Significance points for the Incomplete Beta approximation (See. Section 7) arc 
tabulated in terms of 2p and 2 q. The values of 2 p and 2 y are the hu me. when 
ui(R) = 0, for (c), (d), and (e) above these values are simply JV • - 3, A r •- 5, 
and JV — 11, respectively. Hence, for two-tailed significance points for these 
cases, the ordinary correlation tables can be used with N ~ 3, N ~~ ft, and N — 11 
degrees of freedom, respectively. Also, our one-tailed significance points can bo 
approximated by use of the 10% and 2% significance points fur the ordinary 
correlation coefficient. 10%, 5%, 2%, 1%, and 0.1% two-tailed significance 
points have been tabulated by Fisher and Yates [0]. These significance points 
are accurate to three decimal places for the serial correlation Coefficients as 
follows . 6 


(c) n = JV — 3 degrees of freedom: JV > 24 (a = .05); JV > 30 (a - - .01), 

(d) n = JV — 5 degrees of freedom: JV > 24 (a = ,05); A r > 30 (« .01), 

(e) n = JV - 11 degrees of freedom: N > 24 (a = .05 and « ^ .01), where 

a is the one-tailed significance point. For semi-annual data (b), 2 p - , 2 q 
JV 2 - 3JV + 4 , 

- N 4 -, which is not an integer for N > 12. When N *- 12, 2 p « 


2 i ~ 14 > for which the ordinary correlation significance point in adequate 
for a = .05. 


* It should be noted that for (o), (d), and (e), an approximation given by t loch ran [4] 
is easi y computed and is more accurate than the normal approximation for the a *=> .01 
significance points 

" [ S 2 it 68 than the number o£ pairs used in computing the ordinary correlation 
coefficient when the sample means are first subtracted 



niUXJ.AK .SKKIAI. C'OHHKLATION COEFFICIENT 


65 


Details of computing techniques using the exact distribution are given by 
R. L. Anderson [1] for computing values of R' when #i, = 0. 

3.2, RignifimTire,points of Rfo r other single-period trends. Significancepointshave 
also been obtained for P =• 3, P 4, P = fi, and P = 12, for which KJ = 3. 


TABLE 1 

Exact significance points, R‘, for different fitted scries * 



/* » 2 



P » 2,4 



D - 2,3, 

6 

P *» 2,12/5,3,4,0,12 

’\ or ' 

or. 

01 

, V\or 

.05 

.01 

.v\« 

,05 

01 

N\a 

.05 

01 

0 ! 

. 495 

.490 

8 

.030 

.603 

12 

.592 

.744 

24 

.441 

.592 

8 

.484 

.607 

12 

.515 

.001 

18 

.442 

.592 

36 

.323 

.445 

10 

.453 

.601 

16 

.439 

.582 

24 

.309 

.504 

48 

.267 

.371 

12 

.426 

.572 

20 

.388 

.523 

30 

.323 

.445 

60 

.233 

.325 

14 

.402 

.544 


.351 

.478 

30 

.291 

.403 

72 

.209 

.293 

w 

.382 

.519 

28 

.323 

.441 

42 

. 2(37 

.371 

84 

.191 

.268 

18 

.304 

.496 

32 

.300 

.414 

48 

.248 

.346 

90 

.177 

.249 

20 

.348 

. 476 

30 

.282 

.391 

54 

.233 

.325 

108 

.166 

.234 

22 

,334 

.458 

40 

.207 

.371 

GO 

.220 

.308 

120 

.157 

.221 

24 

.321 

.442 

44 

,254 

.354 

06 

.209 

.293 

132 

.149 

.210 

20 

.310 

.427 

48 

.243 

.338 

72 

.200 

.280 

144 

.142 

.200 

28 

.300 

.414 

52 

.233 

.325 

78 

.191 

.268 

150 

.136 

.192 

30 

.290 

.402 

50 

.224 

.313 

84 

.184 

.258 

168 

.131 

.184 

32 

.282 

.390 

| 60 

.210 

.302 

90 

.177 

.249 

180 

.126 

.178 

34 

.274 

.380 

j 64 

.209 

.203 

96 

.172 

.241 

192 

.122 

.172 

30 

.260 

.370 

i 68 

.202 

.284 

102 

. 166 

.234 

204 

.118 

.166 

38 

.200 

.301 

| 72 

.197 

.276 

108 

.101 

.227 

210 

.115 

.162 

40 

.254 

.353 

| 70 

191 

.268 

114 

.157 

.221 

228 

.111 

.157 

42 

.248 

.345 

! 80 

.180 

.261 

120 

.153 

.215 

240 

.108 

.153 

44 

.242 

.338 

J 84 

.182 

.255 

120 

.149 

210 

252 

.105 

.149 

46 

.237 

.331 

88 

.177 

.249 

132 

.145 

.205 

264 

.103 

.146 

48 

.233 

.324 

92 

.173 

.243 

138 

.142 

.200 

276 

.101 

.142 

50 

.228 

318 

00 

.170 

.238 

144 

.139 

. 196 

288 

,099 

.140 

52 

.224 

.318 

1 100 

, 166 

.234 

150 

.130 

.192 

300 

.097 

.136 

54 

.220 

.307 










56 

.210 

.302 

1 









58 

.212 

.297 

1 









60 

.209 

.292 

i 










* P = Periods Used in Fitted Series. 


In these eases, the, distribution of R is asymmetrical. The Incomplete Beta 
approximation is symmetrical for P = 3, with 2p = 2q = N — 2, even though 
the exact distribution is not. 

The significance points for these single-period trends are given in Table 2. 




66 


R. h. ANDERSON AND T. W. ANDEttHON 


The exact distribution, was required to compute the « -= .01 and .99 significance 
points for N through 48 in all cases and also for most cases with a .05 and 
.95. For N > 48, the Cochran approximation [4] gave the same results as the 
Incomplete Beta approximation. Since this Cochran approximation can Ik* com¬ 
puted more rapidly, it should be used if other significance points are desired. 
The normal approximation is not recommended because it is less accurate than 
the Cochran approximation and requires almost as much calculation. For a s. ,Q1 
and .99, the significance points using the normal approximation were too large 
(in absolute value) by from .0005 to .001 for the last entries in Table 2. The two- 

TABLE 2 

Exact significance points, R', for single periods > 2 


.05 

.01 

N 

.496 

.500 


.475 

.619 

12 

.392 

.526 

18 

.340 

.463 

24 

.304 

.417 

30 

.277 

.382 

36 

.256 

.356 

42 

.240 

.334 

48 

.226 

.316 

54 

.214 

.300 

60 

,204 

.286 

66 

.195 

.274 

72 

.187 

.263 

78 

.181 

,254 

84 

.175 

.245 

90 

.169 

.237 

90 

.164 

! .230 

102 

.159 

.224 

108 

155 

.218 

114 

.151 

.212 

120 


,051 j 
.509 | 
.427 j 
.973 | 
.335 j 
.300 | 
.283 i 
.204 j 
.248 | 
.235 | 
.224 j 
'.214 ! 


.290 j .500 
.277 j .440 
.254 ] .393 
.230 .359 

.220 i .332 
.207 j .311 


.197 .294 

.188 . 279 

.180 .206 
.173 I .255 
.107 : .240 
.101 .237 

.150 ,229 

.151 .222 

.147 .216 

.143 .210 

.140 ,205 

.137 ,200 







67 


« niri I \K hFUI VI, t tmHKLATItra COBPFXCIBST 


1 AHLK 2 • Conhnurd 

P - 12 


.V 


it 



A 


« 




itt 

#5 

w » 

01 


.99 

.05 

.05 

.01 

8 

- ■ , kk ;> 

. 70 S 

.503 

. 03 ? 

12 

--.778 

-.071 

,090 

.245 

12 

, f L 

, 1.08 

.420 

.585 

24 

-. 555 

-.444 

.197 

.330 

Hi 

■ . M 3 

.502 

.300 

.522 

30 

-.447 

— .348 

.188 

.298 

20 

■ .570 

141 

.333 

.174 

48 

- .383 

-.293 

.175 

.270 

24 

- 510 

. 300 

.300 

. 43 ? 

00 

— .339 

— .257 

.103 

.249 

28 

••,477 

.301 

. 2 X 5 

. 107 

72 

- . 30 ? 

-.231 

.153 

.231 

32 

.445 

• .331 

.208 

.383 

84 

- .283 

-.212 

. 145 

.217 

36 

• .418 

.312 

.253 

,303 

on 

— .203 

-. 190 

.138 

.200 

40 

~ .305 

- . 203 

241 

.345 

108 

- ,247 

-.183 

.132 

190 

44 

.375 

,*.t K 

.230 

.330 

120 

-.233 

-. 173 

,120 

,187 

48 

“ ,358 

.201 

.221 

.317 1 132 

-.221 

-.104 

.121 

.180 

52 

-■ .343 

.252 

.213 

.305 

144 ! 

- .211 

-.150 

.117 

.173 

50 

. 330 

.212 

.200 

.294 

156 

.202 

-.149 

.113 

.167 

«) 

• .310 

. 233 

.100 

.285 

Hi 8 

•-.104 

-. 143 

.110 

. 102 

04 

.308 

225 

.103 

.277 

180 

-.187 

- .138 

.107 

. 157 

68 

- .208 

• . 21 K 

.IKK 

,200 

102 

-.181 

- . 133 

.104 

.153 

72 

- .280 

- . 21 ! 

. 1 H 3 

.202 

201 ! 

-.175 

-.128 

.101 

.149 

70 

-.281 

,205 

.178 

.255 

216 | 

-.170 

- 124 

.099 

. 145 

80 

- .274 

- . 100 

.174 

.240 

228 ! 

-.105 

-.121 

.097 

.141 

84 

- .207 

. 104 

,170 

.243 

240 j 

~.HU 

-.117 

.094 

.138 

88 

-.201 

- . 180 

.100 

.238 

252 j 

-. 157 

-.114 

.092 

.135 

92 

- .255 

• .18} 

.102 

.233 i 204 ! 

-. 153 

-.111 

,091 

.132 

90 

— .210 

• , 1 H (1 

.150 

,228 

270 i 

-. 149 

-. 109 

.089 

.130 

100 

-.244 

—.170 

. 150 

.223 

288 | 

-. 140 

-.100 

.087 

.127 

108 

- .231 

• .100 

.150 

.215 

300 '■ 

-.143 

-.104 

.080 

,125 

120 

— .221 

•. 10(1 

.143 

.205 

l 





132 

- .210 

--.152 

. 130 

,100 

i 





144 

- .201 

■ .115 

. 131 

.187 

t 






tailed nignificam-e points can not be obtained from the. ordinary correlation 
tablet) except fur /* *'» 3. 

3.3. Pram pit of ago of nignifimnre points. An an example of the use of these 
significance (wmit*. R\ we jdudl consider the following data [17J on the receipts 
of butter (in units of 1,000,000 pounds) at five markets (Boston, Chicago, San 
Francisco, Milwaukee, and St, Louis). The figures in parentheses are deviations 
from the average nf the given months over the 3 years. 




68 


R. L. ANDERSON AND T. W. ANDERSON 




Year 


Total 

Average 

! 


1935 

1936 

1937 

7, 

Jan. 

48.9(2.4) 

48,3(1.8) 

42.4 (—4. 1 V> 

139.0 

40.5 

Feb. 

43.4(-0.fl) 

47.1(3.1) 

41.4( — 2.0) 

131.9 

44,0 

March 

43.8(~4.6) 

52.4(4.0) 

49.0(0.6) 

145.2 

48.4 

Apnl 

50.8 (—1.5) 

55.3(3.0) 

50.8 (-1.5) 

130.0 

32.3 

May 

67.6(1.6) 

64.7(—1.3) 

05.8(—0.2) 

108.1 

00.0 

June 

83.7(0.7) 

79.5 (—3.5) 

85.9(2.9) ■ 

249.1 

83.0 

July 

82.7(10 7) 

62.6(—9.4) 

70.6 (— 1 . 4) 

215.0 

72.0 

Aug. 

60.8(4.8) 

51.3 (—4,7) 

55.8 (-0,2) 

107.0 

50.0 

Sept. 

55.4(3.6) 

51.0(-0.8) 

49.1 ( — 2.7) 

155.5 

51.8 

Oct. 

48,4(—1.0) 

54.0(4.6) 

45.7 ( — 3.7 ) 

148,1 

40.4 

Nov. 

37.7 (—4.5) 

45.2(3.0) 

43.8(1.6) - 

120.7 

42,2 

Dec. 

41,0(—3.2) 

44.9(0.7) 

40.7(2.5) 

132.0 

44.2 

Total 

664.2(8.4) 

056.3(0.5) 

647.0 (—8.8 b 

1907.5 

055.8 

Average 

55 35(0.70) 

54.69(0.04) 

58.92 i 
(-0.73) j 

103.00 

1 

54.05 


We assume that the trend is composed of the 12 terms having jveritKls that 
divide 12. We shall test the null hypothesis that the deviations from the trend 
are independently distributed against the alternative that there is positive 
serial correlation. The fitted series is of the form 


( 10 ) 


m. 


— bo + 


t 


-1 cos -j- b‘ u sin 
o 6 


?) 


+ bn cos -art; 


here we find it convenient to use the notation, bt , 6? , • • • , bf, , for the com¬ 
ments (with a different relationship between the, subscripts and the trigono- 
me nc unc ions an in (4)). We find that the m< are simply the average receipts 
given for each month in the above table (46.5, 44.0, ■ ■ • , 44.2). Hence the devia¬ 
tions m.) are given by the figures m parentheses (2.4, -0.0, • • • , 2.5). The 
calculated lag 1 circular serial correlation coefficient is 


*0 = ^ 


( 11 ) 


232.18 

474.51 


;M + (—0.6) (—4.6) + •• 
( 2 . 4)2 + (- 0 . 0 )* + 

= 0.489. 


+ (1.6) (2.5) + (2.5) (2,4) 
• • + (2,5)*™ ‘ 


R' (.05) = OS^and^W 6 ’ and 12 and N " w elind that 

null hypothesis of zero semi^correlationT "-^^ ° r 1% ^ ^ 
alternative single-tail hypothesis 0 > m tzj ? 1 f° ^ rejected (against the 
P i p > 0). If we had been interested in the two- 




t nil'rj.Ut SKHIU. COKRKIATION COEFFICIENT 


69 


tailed alternative hypothesis, r v* 0, we wmild nee the ordinary correlation tables 
with N — 11 - 2ft decrees of freedom and we would find that for the two-tailed 
test. R' (.01) 0,487. Dur value insignificant at the 5% level and barely signifi- 

cant. at the 1% level. 

The valuta of h* in MfM are comput'd m follows 

>' £ tm 


( 12 ) 


is 

i,; . v 

("*i 


^lR. 

35 

r, sin 7/ 

/ 

f, Jl ' 

t IK, 


32 

IT* 


7\ etui ri/31'i 


The computed values of >>* !ii fi'i are 54 lid, —14.82, —2.(72, G.OO, 1,23, —3.98, 
0.30 2,21, 1.73, O.ill. OMl, O.lf. respectively. However, it in not necessary to 
compute the«e values m order t r> obtain m,. The problem of estimating the 
variances of these hV will be do-eusM'd in Section 4. 


4. Testing the hypothesis of lack of aerial correlation. 

4.1. Xtutrm'iU i‘f (hr jtriM*m. (Vocoder the N random variables u lf ■ - •, u„, 
tutcli normally and independently distributed with mean 0 and variance <r\ 
Define the ,V variable: jr, , ■ , /* by the equationa 


( 13 ) /, n, p(x, t, ■ p.n,) + w. (i *= 1, ,N), 

where 

( 14 ) r., ** Xu , p..j & nn~i (/ *» 0, lj ’ * - , iV 1) 

and p, ia the linear combination of trigonometric functions given in (2). If L 
ami jV are relatively prime {in particular, if L - l), the Jacobian of the trans¬ 
formation from !«,! to UA w 1 - /, and the probability density of {»<} is 


US) 


1 - p* 

{i***)'* 


h ff 

where Q ■■ (l I *’) £ lx. n,f - Xj Mi)(*i-t ~ **<-*)■ If L m l > 

the, covariance bet wren/, and x, ia 4 p"''' + p*“ l# “' , ]/((l ~ P^H 1 ~ p ^ If 
I, - qa and N • p*. »hw P, 7, ami « are positive integers and q and p are 

relatively prime, then the Jacobian is (1 ~ pT &nd the density of {/,) is 


(l - pT 


r 10 "*. 


( 10 ) 



70 


H. L. ANDERSON AND T. W, ANDEHRON 


We shall now obtain the likelihood ratio test of the hypothesis II* ■ t> •' 0 mi 
the basis of a sample consisting of one observation fin each ?, ■ 

4.2. Preliminary transformaiions. We shall find it convenient to express ? i, in 
terms of fixed variates 0 „ , having certain properties. Later we will verify that 
the 4 >’s are simply constant multiples of the, trigonometric terms in '2). We sup¬ 
pose now that 

(17) m. = 2 0>ir/ f* - U ■ • ■ (-V1, 

i-i 


where K! < N, the ( 7 ,) are parameters, and the 0 „ are known functions of ? mid 
j satisfying 

(18) + $1+1.,j = 2Xt/0 ,, (i — 1, • * ■ i A T ; j 1, ■ ■ * , K'}, 

(19) 5Z 0o0ik = fyt (j, k I, ■ • • , A"'), 

l—l 

(20) 0-,,j = 0^-,., (t » 0, I, * •• , A’ ■ 1), 

and Sjk is the Kronecker delta, Let 


( 21 ) 

where 

( 22 ) 


K‘ 

m t = $, } cj, 

i-i 




N 

2^ Xi 0,;. 


Then by usual regression theory we have 

N 

(23) £ (®, - m,) 0 ,-j = 0 , 

l 

(24) £ fe - „.)’ - £ (*, - m.)* + £ (c, - 7 ,)’ 

(-1 l-t 

because c, is the least squares estimate of 7/. Let us evaluate 

N 

£ = E (at, ~ Mi) (at,-i - Mi-J 
«—1 

N 

S [(sk — m,) (tn,i — Mi)][(®,-i — wu-j,) 4* (m ( „(, 

- S (at< - rm){xi-L ~ m,- t ) 4- 12 52 0>-i.,,(ci — 7i)(x, — m>) 

*—1 i—i 
IT X 1 

+ £ £ 0.i(oj - 7y)(i,-i, — m.-t) 

1—1 i-i 

X X 

+ S ~ Jk)(fij — 7/). 


( 25 ) 



ciKrrr.ut miuM, cntWBt.ynas coefficient 


71 


(’all the lie*! term mi the right hand side of (2,1) L C. In view of (20) the next two 
term" are 


( 20 ) 


V 

2j 2- (a -f <^iw..j)(cy — 7,), 


This in mm to be zero by consideration of (18) and (23). The last term can be 
written 

(27) ‘ f j ]L, 2^ 1 <■! T $ifr.y $ifc)(c* 7*)(Cj ~ 7j) — 23 X/.j(c/ — 7,)* 
by use of (18). (193, and (20). Thus 

* f 

(28) lC 23 > b r ! Wb)(Xi_t Mi-l) *f- 23 kt,(Cj — 7,)*. 

t-l l~l 

It follows that 


(29) 


Q 0 i- p‘) 23 (x, ■- --- 2p 23 (x, ~ mJixi-t, - m,_ t ) 

<"•1 <*« l 

+ 23 (l + p* — 2p\^)(c ; — 7i) 2 . 


We can complete the matrix 4> * (0,/) so that >f> is an N-th ordersquare matrix 
with elements satisfying (18), (19), and (20). If we make the transformation 

h 

(30) *. - 2 2*iiC, (i = 1, • • • , N), 

then 

(31) ±(x,-n<Y~ t t, 

(32) 23 (x. — - w,J = 23 kycj. 

>~l j-K'+l 

4.3. 77ie likelihood ratio criterion. To obtain the likelihood ratio test of the 
hypothesis //» ; p *■ 0 against alternative hypotheses H a : p 9 s 0, we divide the 
maximum of the likelihood assuming i/o by the maximum of the likelihood as¬ 
suming //„. It in clear from (15) tuid (29) that if Ho is true, the maximum like¬ 
lihood estimates of i, and a are c, and 

(33) So * ™ 23 (*. ~ m <) ! > 

tv i~t 

respectively. If II a is true, the maximum likelihood estimate of y,- iscy. To state 
the maximum likelihood estimates of <r* and p under 27» it is convenient to define 
tK, the sample serial coefficient of lag L, as 

i * 

i.R *» tfi 23 (*f — wi<)(*<-t — W(_t). 

tV So <».! 


(34) 



72 


K, L, ANDERSON AND T. W. ANDERRON 


Then the maximum likelihood estimate of a under II n is* 

(35) s 2 = «o(l + ? ~ 2fh.R), 

where p is the maximum likelihood estimate of p and .satisfies 

(36) ^(1 + fid - fid + i**" 1 ) - 0, 
if L and N are relatively prime and satisfies 

(37) Jt(l + fi r ) ~ fid + fi P ' % ) * 0, 

if L = qa, N = pa, and p and q are relatively prime. 

Upon substituting these estimates into the likelihood function we find that 
the likelihood ratio criterion is 

, 00 n , _(l + p ! -2p,/i)‘‘ v 


1 - p» 


if L and N are relatively prime and 


'(1 + p - 2p t ft) ,p T 


if L = qa, N = pa and p and q are relatively prime. The maximum likelihood 
estimate of P is the root of (36) or (37) that makes (3B1 oi (H9), res] vw lively, a 
minimum. It should be noticed that throughout lhm section p eouhl he replaced 
by 1/p (and changing a 3 by a factor 1 + p 3 ). To make the maximum likelihood 
estimate unique, we require that [ fi | < 1. It can be shown that there, exists one 
and only one root of (36) or (37) that satisfies this requirement, and minimizes 
X. (There is a peculiarity to this solution in that if N is odd, L « I, and Jt < 
-1 -f- 2/W, then fi « -1 is the root minimizing X). In any ease, X is a function 
of jJt, We have shown that for 0 < L R < 1 , it is a numotonii: decreasing func¬ 
tion; and for — 1 < Ji < 0, it is a monotoiiie increasing function. A critical 
region defined by X < X, can, therefore, be defined by Ji < Ii y < 0 and 0 < R-> < 
tfl. (The probability that Ji = — 1 or +1 is 0.) Thus wc can use Ji to test the 
null hypothesis H o ■ p = 0 instead of the likelihood ratio criterion (against one¬ 
sided alternatives they axe equivalent). The strongest justification for the use 
of l R in testing #o : p = 0 is that for circular distributions the uniformly most 
powerful tests against one-sided alternatives and the 2 ?, test against two-sided 
alternatives are given in terms of inequalities on JI (3]. 

rim e M “! estimate of P - In di is asymptotically a root of 

p"-Vi »i US 18prcmd b y showing that L R( 1 + Jf v ) - ji(i 4 . a ) „ 

* U ~ f] "T erg l B stochasticall y to zero. We shall use Ji both to t*H- 
mate p and to test hypotheses about this parameter,” 

T d in SeCti0D 4,2 in terms of the toigonwnetrio terms 
indicated in Section 1 . In the rest of the paper we shall let the index g run from 

negl^ting^hTj^obTaTbas^ “ ^ maxiraum llkaUhoQ d estimate for p f a constant by 



r;mcri.A.H SERIAL CORRELATION COEFFICIENT 


73 


0 to \N for N even and from 0 to }(N - 1) for N odd; we let the index h run 
from 1 to -JA r - 1 for .V oven and from 1 to - 1) for N odd. We shall use a 
prime to denote an index running over those values corresponding to fitted terms 
and a double prime to denote an index running over those values corresponding 
to terms not fitted. 

Let the N trigonometric functions oft, namely cos 2 -*!? and sin 2 -^- benum- 

N N 

bered from 1 to -V such that the fitted terms are numbered from 1 to K' and the 
non-fitted terms from K' + 1 to N. According to this numbering we define 4>a to 
4>ifi tie 



Defined this way, the satisfy (18) and (19) and (20). It can be shown by using 
the addition formulas for sines and cosines that 


(■ 12 ) 


X 


i.., 


cos 


2 a-/,/ 

N' ’ 


where / - g or/ h depending on whether j refers to a term (40) or (41). Wo 
shall assume that the numbering of trigonometric functions is such that 

(43) X * ,f~j > X/„jr»+* ^ "■ > . 

It can easily lie seen that (2) is of the form (17) except that a„- and /3v mus*' 
be multiplied hy x unless y' ~ 0 or and b yy/N for g' = 0, to obtain 
7 , Tlie regression eoeftieients <v and are similarly related to the c,. 

It can he seen from (29) that the a / and 5/ are independently distributed with 

variance \N<Y j -f p — 2 p cos f° r / ^ 0 ( ffl and variance Afir ! /(1 — pY 

for / = 0 and for / J.Y if L is even and N<Y/(l + pY for / = \N if L is odd. 
In these variance formulas we can estimate c* from (35) using L R for $ and p. 


5. The exact distribution of dt- 


5.1. Inlrodwtim. Under the null hypothesis //»: p = 0 the observations (a;, ) 
are normally and independently distributed with variance Y and means Ex< = p,, 
The variables c, defined hy (22) and (29) are normally and independently dis¬ 
tributed with variance c* and means y,-. For j > K’, y, = 0. It follows from 
(31), (32), (33), and (34) that 


Z Xa/cJ 


Ji- l=£ ^ L 


l-K'+l 


(44) 



74 


B, L. ANDEBSON AND T. TV. ANDKUHON 


where the X LlJ are given by (42) corresponding to the K" ~ (.V - Tv') trigo¬ 
nometric terms not fitted. Thus to obtain the distribution of Ji we net>d only 
consider the joint distribution of [c,\,j — K' + 1, ■ • * , A . If //„ is true, the 
joint density of all the Cj is (15), where 

(45) Q = (1 + p 2 )V — 2p iC + g (1 -f- p L — 2pXf, ; )(r ; 




and 


7= E c? and tC = 23 Xn<V 

j-A'+l ;~AC'+l 


5.2. Some special distributions of iR = R. If the constant term (t/ 0) i« 


fitted and the other terms are fitted in pairs ^coa ami sin 

is odd. If N is odd, then K" is even; the Xi/ occur in pairs ami we can define 
as 


2 rif 


2rj/\ .. 

.V )> 


(46) 


Xi,jr'+i — Xx,j;/+j — Xi > Xi,jc<+3 = Xi,jc» 


Xj > 


Xi„v~i ** Xiw -- X(jt< 


This also holds if N is even and if, in addition to the constant term and paired 
cosines and sines, we fit cos ri = (-1)%' - AT/2). If N is even and we do not 
fit cos in, we have K" odd. Then 


(47) 


Xi.jt'+i — Xi,k>+i — Xi > Xi.x'+j = Xi,jc< 


-ft 


Xi > 


Xl,j/_3 


— = Xi( K "-i) > Xi* =* xV ( xm + i) « — 1, 

The general expression for the distribution of R in these cases has been found 

by one of the authors [2], In this case the cumulative distribution function is 

1 minus 


Pr{R > R >) = g (-1)*+* | 7, |( X " _ R')'*"~\ 

C* <R' < x", 

where 7* is found from a result of Lehmann [9] to be 

n-m __ 

N 


(48) 


(49) 


N 


Bin 2 4r- sin *-L II V(xT 


Xy<)> 


where f is such that X* = cos and the product on? is over the K' terms 

* K' MK>’ takes . on *' “ 1 ^ues in *( K‘ ~ 1) pairs 

also JriteVTas ^ ~ Paiia P US a Smgle Xw ' “ even. We can 



circular serial correlation coefficient 


75 


Vk = 


(50) 


^(V+iC') 

____ 


. 2 t/ 
sin —sm 




A 1 (Vo KA A 


n/ si »^ 


+ /">...-w -n 


sm 


IV N ' 

5.3. Some special distributions of lR for L > 1 . We have noted in (44) above 

2 tj ,,/ 

that \ t j = cos —where f corresponds to a term not used in the estimation 

equations for m*, which was a function of jcos > sin ■ If L> the lag, is 

relatively prime to IV, the distribution is the same as that given above for L = 1, 
except for the re-evaluating of the \k . In the article by R. L. Anderson [2], 
where only the constant term in m, was used, the X* for lag L were exactly the 
same as the X* for lag 1. However, this will not be the case for other terms 
used in m,. For example, consider lag 2 and N odd with m, consisting of the con¬ 
stant term plus terms in cos ^ and sin ^ . In this case the X* for lag 1 are 

N N 


4t 6ir 
cos w , cos w , 


, cos 

2 r 

N 


(N - Dtt 
N 

67 r 

'W 


and the x" for lag 2 are 
(N - 1 )tt 


8 tt 


, cos 


N 


Next suppose the highest common factor of L and IV is a (as befdre, L = qa 
and N = pa, with p and q relatively prime). In this case 


(51) 


Xt.i = COS 


2 nqf ' 




Since p and q are relatively prime, the results are the same as for q replaced by 1 
and L replaced by a. Each root is repeated a times. 

N = 2 !L(p = 2 ) 

If we let N = 2 L, X* = cos nk = +1 or — 1. X^ = +1 corresponds to these 
fitted terms in m,: -jl, cos , sin j- for g\ h' even. X” ~ — 1 corresponds 
27 rig' 2irih'\ , 


to these terms: ^cos —j~~ , sin for g', h' odd. Let L — «i be the number 


N 


of terms pertaining to X = +1 and L — n 2 be the number of terms for 
X" = —1. Then, as in [2], we have the density 

(52) 


tic o \_ (I cRi) i(ni ~ 2) {i + 

w , \rh) - 


where Jh was the notation used for lag L and p = 2. The cumulative function 
is the Incomplete Beta function, found by setting x = $(1 — R'). 

N = 3 L{p = 3) 



76 


ft. L. ANDERSON AND T. W. ANDERSON 


If we let N = 3L, X* = cos - +1, The fitted terms in m t corre- 

N 

sponding to x" = 1 are jl, cos , sin for g', h 1 = 3m. Similarly, those 

corresponding to x" = -4 have g\ h! = 3m — 1 or 3m — 2. Let the number 
of fitted terms with x" = +1 be L — 7ii and with = — 4 be 2/j — itj Then 


(53) 


n( p s _ (1 - ,fl,) i(ni ~ 2) (4 + 

U 3j (3/2) lto,+ni) -^C47n , hrv>) 


where Jt 3 > This cumulative function is also an Incomplete Beta function, 
found by setting m = 2(1 — R')/3. 


JV = 4 L(p = 4) 

If IV = 41/, Xfc = cos -jy- = +1, 0, —1. The fitted terms in m, corresponding 

to X" = 1 have/" = 4m, those for = —1 have /" = 4m — 2; and those 
foT x" = 0 have/" = 4m — I or 4m — 3. Let the number of terms in m, of each 
sort be L — ni, L — nj, and 2 L — n 3 , respectively. Then 


(54) D(R) = c 


(1 + f (t _ y )K»i-»> 

• ((1 - R) - y(l + rf2/> 

(1 - f l ^ <n '- s5 (i _ y)**"*- 1 ” 

‘'v—0 


for R < 0, 


l ■ 1(1 4- R) -1/(1 ~ R)] il " 1-,> dy, for fl > Q, 

where R is ifl t andc = T(4[n l + n* + 7lJ)/[^(4n^)^(4n ^ )^(4n3)2 lc,n+ " , " ^, ]. 

5.4. . The exact distribution of ifl wlien p 0. The joint distribution of the ob¬ 
servations for lag 1 when the null hypothesis is not true (p ^ 0) is (15), where 
Q is given by (45) with L = 1 and ,C = RV. V , fl, [c,}(/ - 1, ... , £') are a 
sufficient set of statistics for estimating <r 2 , p, and {7,'] (/ = l ... J^ 1 ) Using 

the results given by Madow [11], it can be shown that the simultaneous dis¬ 
tribution of V and fl is 


(55) 


1 - o* 


r j—i 


+ p 2 — 2pXi^.) 




where D(R) 18 the density function corresponding to (48). Intesratinv V from 0 
to %we obtain as the density for fl S g V lr0ra 0 


(56) 


(1 - p y )(hK" - 1 ) 


V%, 


(1 + p 2 - 2pXi r ) 


(1 + p - 2pfl) lc " 


• g (~1) W (X^ - fl) 1 '*"'*’ \ Vk \, 



CIRCULAR SERIAL CORRELATION COEFFICIENT 


77 


for 3C + i < R < \m , where F* are given, by (50). In the same way, one obtains 
the distribution of JR for p ^ 0 when N = 2 L, N = 3 L, and N = 4L by multi¬ 
plying (52), (53), and (54), respectively, by 

(1 + p - 2 P R) kK " 


( 57 ) 


(i - p p v 


/ft, 


(1 + p 2 — 2 pXz./') 


where — ni -f- Ba or m -f- n* 4- n>. This method was used by Madow for 
residuals from the sample mean [12]. 


6. Moments. 


6.1. The exact moments of R. Most of the results of this section are straight¬ 
forward adaptations of earlier results for the ease of p, constant. Hence, we shall 
omit the details of derivations. The moment generating function of V and C 
for J = 1 is 


(58) 


4>(t o, i) = E(e 


= W/A" r+,c 


) = 


1 - p* 


n r 

H 1 + P — — 2(p + f) Xi,," 

i"-x'+i L 


The h moment of R — C/V is given by 

i.0 fVh-l fl/l a* jl"| 

(59) w(fi) = / / ■ • ■ / J\ dto II dy t , 

V—oo J—oo *1— oO ut f“«0 


with the {j/,J restricted from being too large (not more than a certain amount 
larger than zero). In the case of independence, (p = 0), we have the following 
first two moments of R: 


(60) 


Hi(R) — —r, Xi,p»; 

ft- 1 


Mz 


kfi) = 


K" (K" + 2) ,»5?.+i 


S Xi.j" + 




X" + 2 


r.T 4 '^' 

U(I2)) 8 . 


If the Xi,y" are symmetrical (i.e. for each X ll} », , there is a Xu-< = — Xi,,--), the 
mean of 12 is 0. For example, if 1 and (— 1)' are fitted for N even, the mean is 0. 

6.2. Approximate moments of R when p — 0. Since 12 and F are independent 
[8] when p = 0, J(R) — p! (C) /J (V). V is a sum of squares and its momenta are 
the same as for x z with N — K r = K" degrees of freedom. Using methods similar 
to those given by Dixon [6], we see that the moment generating function for 
C is 


(6D m == a® • m • 7(0, 

where 

(62) «(i) = (§y*,/9(0 = A N /[Af - (20*1, 

7 (t) = JJ,, (1 — 2t Xi.,-), and A = 1 + Vl — 4 1 1 - 



78 


E. L. ANDERSON AND T. W. ANDERSON 


(63) 


$(<) = oi(t)-y{t) = 


la this ease, X u - = cos includes all X' terms corresponding to those in m. . 
Since the first N derivatives of 0(0 are zero at i = 0, we can use 

_ fE- (1 - 2tXi„0 i 

(1 + Vi “~4P)‘ W 

as an approximation to (61). This expression yields the exact moments of (7 
up to order N. 

. 2 W* 

As a special case, consider •K v = 3, with Xi.i — 1 and 1 m ,2 *= Xi.a ~ cos -. 

In this case 


(64) 


j> 3 (t) = (l - 2t cos 5i(0. 


Successive derivatives of (64) at t = 0 show that 

(65) w(fla) = [p w(Bi) - m cos 2 -^- /x;_i(/2i) , 

where P = w(^)/w(^) = (N - 3 + 2 h)/(N - 3), Q “ M-tCFi)/^^) - 
2/(IV — 3), and h = 1, 2, • • • , N. 

6.3. Approximate moment generating function of C and V when p ?*■ 0. To obtain 
an approximate moment generating function for C and V when p ^ 0, we utilize 
an approximation method given by Leipnik [10], The exact moment generating 
function (58) with <r 2 = 1 can be written as 

(66) <t>(ta ,<) = (! — p Y )0 exp. j- $ £ log J\ + p 1 - 2ta - 2(p -f 0 coa y j , 

where d =IIp[l + p 2 — 2ta — 2(p + and/ refers to the K‘ fitted terms 

in m<. If the sum in the exponent of (66) is replaced by 

(67) ^ log j\ + p 2 - 2/o - 2(p + t) cos dx, 

and if (1 — p' v ) is replaced by 1, we obtain the approximate momont generating 
function 


(68) 5 = _ IL- tl + p 2 — 2tp - 2(p + t) Xi,/,] 4 

lid + P 2 - 2io + V(T+7’“2 <o) 2 - 4(p + 0*)F ’ 

7. Approximate distributions of R, 

7.1, The Pearson Type I ( Incomplete Beta) distribution. The significance points 
of lR can be found exactly from equation (48) for L = 1 and by integrating 
equations (52), (53), and (54) for N = 2 L, 3L, and 4L, respectively. These exact 
probability integrals for N = 2 L, 3L, and 4L are simply sums of Incomplete 
Beta functions, and the significance points can be found in Pearson's Tables of 



CIRCULAR SERIAL CORRELATION COEFFICIENT 


79 


the Incomplete Beta-Function [14] or in the Thompson tables [16]. However, the 
computation of the exact significance points for L = 1 and N > 4 by use of 
equation (48) is quite tedious and actually impossible for largo N with present 
logarithm tables and readily available computing devices. Hence, approximate 
distributions are called for. 

The Type I approximation to the distribution of R is 


(69) 


MR) - 


(l + RY " 1 (l - RU 1 
2 P+ *~ 1 P(p, q) 


1<R<1, 


where p and q are chosen so that the first two moments of this approximate dis¬ 
tribution agree with the first two moments of the exact distribution. It can be 
shown that each moment of the approximate distribution approaches the corre¬ 
sponding exact moment quite rapidly as N increases. On the basis of the ap¬ 
proximation, the probability a of the significance point R' being exceeded can 
be found from the Incomplete Beta function. Thus 


(70) a « Pr[P > jR'J = 1 - 7,(p, q) = W, q'), 

where 


(71) I x (p, q) = [ y v ~ l (1 - 2/)'-* dy, 

and x — (1 -f 120/2, x' — (1 — a:), p' — q, and q' - p. Hence, R’ = 2x — 1 « 
1 - 2x'. 

The parameters in (69) are taken to be 

(72) 2p = (l + w)(l -w)/jn, 2 j«(1-m0(1-w)/w, 

where pi = ms — On) 2 and p\ - p[(R) given in (60). Hence, when the distribu¬ 
tion of R is symmetric, mi = 0 and 2p = 2q = (1 — pi)/pt • 

In Section 3.1, we set up significance points for four special trends for which 

Pi = 0 : 

(b) P = 2; (c) P = 2, 4; (d) P = 2, 3, 6; (e) P = 2, 12/5, 3, 4, 6, 12. 
The values of m£ for these four trends are: 

(b) (.N - 4 )/{N(N - 2)], (c) 1/(iV - 2), (d) 1/(N - 4), (e) 1/(N - 10). 
Naturally the third moments for these symmetric distributions are 0. The fourth 
moments are as follows ; 


Trend 

(b) 1 

<o) 

(d) 

(«) 

« 

Exact 


3<W» - 2V ~ 16) 

3 

S 

(N-M) (N + 2)N(N-3) 

(jV + 2) (N) (V - 2) (V - 1) 

(N -2) (A ~ 6) 


Incomplete 

3(N-4)> 

a 

3 

3 

Beta 

N(N — 2) (AT* - 8) 

N(N - 2) 

(N~2)(N-4) 

(N-8)(N -10) 


We note that for (d) and (e), the fourth moments for the Incomplete Beta are 
exact and for (b) and (c), they approach the exact values quite rapidly as N 
increases. 








80 


R. L. ANDERSON AND T. W. ANDERSON 


In Section 3 2 we considered some significance points for the following single- 
period trends: ft = 3,4, 6, and 12 The values of 2p and 2 q for these asymmetrical 

cases are 


(73) 


(M — 4 — 2 X)E 0 (N - 2 + 2 \)E 

2p = 1- - -; 2 q = - D -• 


where X = cos E = (N - l)(N - 4) -4X and I) - (Af - 3K-V - 1 -b 4X) - 
(AT - 1)(1 + 2X). 2 

Equation (69) has the drawback of using the range (— 1. +1) instead of the 
true range of ft, which varies between the last (smallest) X* to the first (largest) 

2rri . '2xi 

Xi'. For example, if AT = 12 and we fit the constant, cos , and sin then 
Xu = 1, Xi, 2 = Xu = cos ^ , and the range of ft is I, cos 


However, if we fit the constant and cos wi = ( —1)‘, then Xu = 1 and X) s *= —l, 
the true range would be From these examples we sen that the. 


error in using the approximate range ( — 1, +1) varieB according to the fitted 
terms in , and that the error is worse on one tail than on the other, unless 
symmetric terms are fitted A more accurate approximation could he obtained 
by use of the exact curtailed range, but it was not thought desirable Ix-cause the 
exact range rapidly approaches the approximate range as N increases. 

We might add that the significance point, ft', can ulso be calculated from the 
Inverted Beta (F) distribution, for which tables are given by Merrington and 
Thompson [13], Snedecor [15], and Fisher and Yates [(>]. Cochran [4] has provided 
an approximate formula for z = $ Iog„ft when rii and nt are not given in the 
ft-tables. 

7.2 The normal approximation It should be noted that ft is asymptotically 
normally distributed for p = 0, as shown by the form of the characteristic, func¬ 
tion. We have considered the normal approximation with mean p[ (ft) and 
variance (ft) The variance of ft was given in the previous section for the four 
special trends. For all single period trends, except P - 2, ui = -(1 + 2X)/ 
(N — 3) and the variance is 


(74) 




(AT — 1 + 4 X) , 
(N — 1)(AT — 3) 


where, as before, X = cos (2ir/ft). Further terms in an asymptotic expansion of 
the distribution would take account of higher moments of ft as Hsu has done 
for the case of fitting only the mean (m ( = a constant) [7], 


REFERENCES 

U) R, L Anderson, Serial correlation in the analysis of lime senes, unpublished thesiB, 
Library, Iowa State College, 1941. 

12] It L Anderson, "Distribution of the serial correlation coefficient,” Annals of Math, 
Slat ,Vol 13 (1942), pp, 1-13, 



CIRCULAR SERIAL CORRELATION COEFFICIENT 


81 


(3] T. W. Anderson, “On the theory of testing serial correlation,’' Skandinamak Akluari- 

elidskrift, Vol 31 (1948), pp. 88-110. 

[4] W. G. Cochran, “Note on an approximate formula for the significance levels of 2 ,” 

Annals of Malh. Slal., Vol. 11 (1940), pp 93-95 
[6] W. J. Dixon, "Further contributions to the problem of serial correlation,” Anna In of 
Math. Slal., Vol. 16 (1944), pp. 119-144. 

[6] It A. Fisher and F. Yates, Statistical Tables for Biological, Agricultural, and Medical 

Research, 2d ed., Oliver and Boyd Ltd., 1943. 

(7] P. L. Hsu, “On the asymptotic distributions of certain statistics used in testing tha 

independence between successive observations from a normal population,” Annals 
of Math. Slat., Vol 17 (1946), pp 350-364. 

18] T. Koopmans, “Serial correlation and quadratic forms in normal variables,” Annals of 
Math. Slat., Vol 13 (1942), pp. 14-23. 

[9] E. L. Lehmann, "On optimum tests of composite hypotheses with one constraint,” 
Annals of Math. Slat., Vol. 18 (1947), p 481 

[10] R. B. Leifnik, “Distribution of the serial correlation coefficient in a circularly cor¬ 

related universe,” Annals of Math. Slat , Vol. 18 (1947), pp. 80-87. 

[11] W. G. Madow, "Contributions to the theory of multivariate statistical analysis," 

Trans. Am. Math Soc , Vol. 44 (1938), p. 461. 

[12] W. G. Madow, “Note on the distribution of the serial correlation coefficient,” Annals 

of Math. Stat., Vol. 16 (1946), pp. 308-310 

[13] M. Merrinoton and C M. Thompson, “Tables of percentage points of the inverted 

Beta ( F ) distribution,” Biometrika, Vol 13 (1943), pp. 73-88. 

[14] K. Pearson, Tables of the Incomplete Beta-Function, Cambridge Umv. Press, 1034. 

[15] G. W. Snedecor, Statistical Methods, 4th ed., Iowa State College Press, 1946, pp. 

222-225. 

[16] C. M. Thompson, "Tables of percentage points of the incomplete beta-function,” 

Biometrika, Vol. 32 (1941), pp. 161-181. 

[17] United States Department of Agriculture, Agricultural Statistics, United States 

Government Printing Office, Washington, D. C , 1939, p. 390. 



BAYES SOLUTIONS OF SEQUENTIAL DECISION PROBLEMS 
By A. Wald and J. Wolfowitz 
Columbia University 

Summary. The study of sequential decision functions was initiated by one of 
the authors in [1]. Making use of the ideas of this theory the authors succeeded in 
[4] in proving the optimum character of the sequential probability ratio teat, 
In the present paper the authors continue the study of sequential derision func¬ 
tions, as follows: 

a) The proof of the optimum character of the sequential probability ratio 
test was based on a certain property of Bayes solutions for sequential decisions 
between two alternatives, the cost function being linear. This fundamental 
property, the convexity of certain important sets of a priori distributions, is 
proved in Theorem 3.9 in considerable generality. The number of possible' deci¬ 
sions may be infinite. 

b) Theorem 3.10 and section 4 discuss tangents and boundary points of these 
sets of a priori distributions. 

(These results for finitely many alternatives were announced by one of us 
m an invited address at the Berkeley meeting of the Institute of Mathematical 
Statistics in June, 1948) 1 

c) Theorem 3,6 is an existence theorem for Bayes solutions. Theorem 3.7 
gives a necessary and sufficient condition for a Bayes solution. These theorems 
generalize and follow the ideas of Lemma 1 of [4] 

d) Theorems 3.8 and 3.8.1 are continuity theorems for the average, risk func¬ 
tion. They generalize Lemma 3 in [4] 

e) Other theorems give recursion formulas and inequalities which govern 
Bayes solutions. 

1. Introduction. In a previous publication of one of the authors [1] the decision 
problem was formulated as follows: LetX = (a;,) {i ~ 1,2,-*-, ad inf.) be 
a sequence of chance variables, An observation on X is given by a sequence 
x = {x,} (i = 1 , 2, ■ ■ • , ad inf.) of real values, where x, denotes the observed 
value of J..A sequence x is also called a sample or sample point, and the totality 
M of all possible sample points x is called the sample space. Lot (7(x) denote the 
probability that X { < x<for t = I, 2, • • • , ad inf.; i.e., G is the cumulative dis¬ 
tribution function of X. In a statistical decision problem G is assumed to be un¬ 
known. It is merely known that 0 is an element of a given class II of distribution 
functions. There is given, furthermore, a space D* whose elements d represent 
the possible decisio ns that can be made in the problem under consideration. 

• 'v brl , ef 8t;atement of some of the results of the present paper is to be found 
W Qo^ino rB ' Papel ° f the Bame Dame in the Proc - Nat ' Acai - Sei - V- S - A., Yol. 85 (IMP), 


82 



BAYES SOLtmONB 


83 


The problem is to construct a function d = D(x), called the decision function, 
which associates with each sample point x an element d of D* so that the decision 
d = D(x) is made when x is observed. 

Occasionally we shall use the symbol D to denote a decision function D{x). 
This will be done especially when we want to emphasize that we mean the whole 
decision function and not merely a particular value of it corresponding to some 
particular x, 

If d = D(x) is the decision function adopted and if x° = {*?} (i =“ I, 2, • • *) 
is the particular sample point observed, the number of components of x 0 we have 
to observe in order to reach a decision is equal to the smallest positive integer 
n = n(x°) with the property that D(x) = D(x°) for any x for which Xi = Xi , • • * , 
x n = £« . If no finite n exists with the above property, we put n( x) ~ «> . If 
d(x ) is equal to a constant d, we put n(x) = 0. We shall call n(x) the number 
of observations required by D when x is the observed sample. Of course, n( x) 
depends also on the decision rule D adopted, To put this in evidence, we shall 
occasionally write n(x,D ) instead of n(x). If Do is a decision function such that 
n(x, D 0 ) has a constant value over the whole sample space M, we have the classical 
non-sequential case. If n(x,D 0 ) is not constant, we shall say that Do is a sequential 
decision function. 

In the remainder of this section we shall sketch briefly some of the fundamental 
notions of the theory without regard to regularity conditions. The latter will be 
discussed in the next section. 

In [I] a weight function W(G, d ) was introduced which expresses theloss suffered 
by the statistician when G is the true distribution of X and the decision d ia 
made. Let c(n) denote the cost of making n observations; i.e., c(rt) is the cost 
of observing the values of X\ , ■ • • , X„ . Then, if the decision function d D{x) 
is adopted and G is the true distribution of X, the expected value of the loss due 
to possible erroneous decisions plus the expected cost of experimentation is 
given by 

(1.1) r(G,D)= [ W{G,D(x)]dG(x)+ f c[n(,x,D))dG(x). 

Ju J M 

The above expression is called the risk when D is the decision function adopted 
and G is the true distribution, 

Let £ be an a priori probability distribution on fi; i.e., £ is a probability measure 
defined over a suitably chosen Borel field 2 of subsets of fi, Then the expected value 
of r(0?, D) is given by 

(1-2) r(£,D) - f r(G,D)d^. 

J o 


8 A Borel field is an aggregate of Beta such that a) the null Bet is a member of the field, 
b) the complement with respect to the entire space (here M) ia a member of the field, c) 
the sum of denumerably many members of the field is itself in the field. 



84 


A. WALD AND J. WOLTOWITZ 


The above expression is called the risk when, £ is the a priori distribution on U 
and D is the decision function adopted. 

We shall say that the decision function Do is a Bayes solution relative to the 
a priori distribution £ if 

(1.3) r(£, D 0 ) g r(£, D) for all D. 

If there existed an a priori distribution on ft and if this distribution were known, 
we could put £ equal to this a priori distribution and a Bayes solution relative to 
£ would provide a very satisfactory solution of the decision problem. In most 
applications, however, not even the existence of an a priori distribution can be 
postulated. Nevertheless, the study of Bayes solutions corresponding to various 
a priori distributions is of great interest in view of some results given in (1]. 
It was shown in [1] that under rather general conditions the class C of the Bayes 
solutions corresponding to all possible a priori distributions £ has the following 
property: If Di is a decision function that is not an element of C, there exists 
a decision function D 2 in C such that 

(1-4) r(G, A) ^ r(G, D t ) for all G 

and 


(*■6) r(G, D 2 ) < r(<?, Di) for at least one G. 

It was furthermore shown in [1] that under general conditions a minimax 
solution Do of the decision problem is also a Bayes solution corresponding to 
some a priori distribution £. By a minimax solution we mean a decision function 
Do such that, for all D 


( 1 . 6 ) 


Sup r{G, D 0 ) £ Sup r{G, D). 

<7 0 


2. Regularity conditions and other assumptions. We shall make the following 
assumptions: 

Assumption 1. The chance variables are identically and independently distributed, 
he common distribution is either discrete or absolutely continuous. 

[ F l dei>ote ^mentary probability law of X, when F is the dis- 
tobution of Z,; l e. when F is discrete, p(a | F) is the probability that X, - a, 
and when F 1S absolutely continuous, p(a | F) is the probability density of X t 

° f ??“*? ? * be the 8malIe6fc Borel Md which contains 
au sets of points x which are defined by the relations 


* w 


J — I 4 * J 

“' E * Ch * d " i « 8ibte ' * a probability 

; the totality of the*, probability metres tan. Let H- 

! An F or F* is admissible if F* is in n. 



HAYES SOLUTIONS 


85 


be a given Borel field of subsets of S2. The only subsets of 0 which wo shall dis¬ 
cuss in this paper will be members' of II*, and all probability measures on U 
which we shall discuss will be measurable (II*). This will henceforth be assumed 
without further repetition. 

Let A* be any set in H*, and A the set of F which corresponds to the F* in 
A*. The sets A form a Borel field, say II. By definition, the probability measure 
of a set A according to a probability measure £(//*) on Si is to be the same as the 
probability measure of A* according to 

Let M X Si be the Cartesian product of M and SI ([5], page 82), and K be the 
smallest Borel field of subsets of M X & which contains the Cartesian product 
of any member of B by any member of H*. 

For a given decision function d = D(x), W(F, D(x)) is a function (if F and x. 
Hereafter in this paper we shall limit ourselves to functions D(x) such that 
W(F, D(x)) is measurable (K), and n(x, D ) is measurable (B). 

It is true that in Section 1, W was given as a function of G, the distribution of 
X. Because of Assumption 1, G = F*, and there is a one-to-one correspondence 
between F and F*. Thus we may, in appropriate places, interchange them freely, 

Assumption 2. For every real a, except possibly on a Borel set* whose probabil¬ 
ity is zero according to every admissible F, p(a j F) exists and is a function of a and 
F which is measurable ( K). If the admissible distributions F are discrete ., thereexists a 
fixed sequence {&,•) {i — 1, 2, • • • , adinf.) of real values such that XlT-i p(b, | F) ~ 
1 for all admissible F. 

Assumption 3. W(F, d) is bounded. For every d in D*, W(F, d) is a function 
of F which is measurable (II). 

In what follows £ will always denote a probability measure (II*) on SI. Thus 

IF(f, d) = f W(F, d) df 
J n 

exists. 

Assumption 4. The function c(n) — cn. Without loss of generality we may 
lake c = 1, so that c(n) = n. 

We shall introduce the following convergence definition in the space D*: 
the sequence {d,-} converges to do if 

lim W(F, d.) - W(F, do) 

i —+ 00 

uniformly in the admissible F’n, 

Assumption 5. The space D* is compact in the sense of the above convergence 
definition. 

One can easily verify that, if lim d< = do, then 

i —*ao 

lim W(£, d { ) = W(£, do); 

t —*oo 


* A Borel set is a member of the smallest Borel field which contains all the open sets of 
the real line. 



86 


A. 'WALD AND J, WOLFOwm 


i.e., W(£, d) is a continuous function of d. Thus, because of Assumption 5, the 
minimum of 1F(£, d) with respect to d exists. 

We shall now show that, under the above conditions 

(2.1) f W[F*, D(x)] dF*(x) 

exists and is a function of F * measurable (II*). For any j let 11, be the act in B 
such that n(z, D) = j. Then it is enough to show that, for any j, 

(2.2) [ W(F*,D(x)]dF*(x) 

* Rj 

exists and is a function of F* measurable (H*). 

In the discrete case, the integral (2.2) is equal to the sum 4 

(2.3) Z W[F*, D(x))p( Xl |F)..- V (z, | F). 

on, 

For fixed values of , • ■ • , x ,, the expression under the summation sign is 
obviously a function of F* measurable (II*). Since, because of Assumption 2, 
there arc only countably many points (a* , • ■ • , xj) in It /, the sum (2.3) must 
be a function of F* measurable (If*). 

In the absolutely continuous case, the integral (2.2) is equal to (2.4) 

V-V I Hi W\T, £(*)] ft V(x, I F) dv(j) 

where v(j) is Borel measure in the jf-dimensional Euclidean space. The integrand 
is measurable (K). Hence, the integral (2.4) exists and is a function of F* measur¬ 
able (H*) (see [5], Chapter III, Theorems 9.3 and 9,8). 


3. Some results concerning Bayes solutions. If £ is the a priori probability 
measure on 9, the a posteriori probability of a subset u of (1 for given values 
• i of the first m chance variables is given by 


( 31 ) 


Let 


£(<■> I £, mi, • ■ •, xj 


f" p(xi | F) ... p( Xm | F) d£ 
f a I* 1 )-*- P(x» 1 F) d£ 


~ 3 ' 2 ^ po(£) = Min W(£, d). 

tr in n gral ValUe m ’ iet '* (f) denote tho ^mum of r(£, D) with 
respect o f, where D is restricted to decision functions for which nS IX) < m 

^•Fo^osdive integer m, let d « f>- M denote a decision funefion 
if (2 3) and (2.4), proceed aa 



DAVES SOLUTIONS 


87 


D for which n(x, D) g m for all a:. Thus, wc can write 

(3.3) p„ft) = Inf rft, /;") (m = 1, 2, ■ ■ ■ , ad inf.). 

Let 

(3.4) pft) = Inf r ft, D). 

D 

We shall first prove several theorems concerning the functions pt>(£), p ,»(£), 
and pft). 

Theorem 3.1. The following recursion formula holds: 

(3.5) Pm+ift) — Min [p a ft),l + ^ Pm(fo) p(o | £) da 

(to = 0, 1, 2, • • • , ad inf.) 

where 

(3.6) £«(<») = £(« | {, a) and p(a | £) = [ p(a \ F) eft, 

J o 

Proof: Let p* ft) (to = 1, 2, • • • , ad inf.) denote the infimum of rft, D) with 
respect to D where D is subject to the restriction that n{x, D) gr 1 and g to for 
all x. Clearly, 

(3.7) p.+i«) = Mmjpoft), pj+ift)]. 

Let p„(£ | a) denote the infimum with respect to D of the conditional risk (con¬ 
ditional expected value of W[F, jD(x)] + n(x, D)) when the first observation 
Xi on Xi is a and D is restricted to decision functions for which n(x, D) & 1 and 
g to for all x. Let D(?n) be the temporary generic designation of such a decision 
function. Let D(m | a) be the decision function which is obtained from /)(m) 
when the first observation is a. Finally let rft, D | a) be the conditional risk when 
the a priori distribution function is £, D is the decision function and requires at 
least one observation, and the first observation is a. We then have that 

r(£, D(m + 1) | a) = rft„, D(m + 1 | a)) + 1. 

Hence 

(3.8) Pm+i(£ | a) = pm((o) + 1. 

The unconditional quantity p« +l ft) must clearly be equal to the average value 
of the infimum of the conditional risk. Thus we have 

(3-9) Pm+ift) = f Pm+ ift | a)p(a | £)da. 

J—tC 


* If the distribution of X is discrete, the integration with respect to a is to be replaced 
by summation with respect to a. This remark refers also to subsequent formulas. 



88 


A. WALD AND 3. WOLFOWETZ 


Equation (3,5) follows from (3.7), (3.8) and (3,9). 

Theorem 3.2. The, junction p(() satisfies the following equation: 


(3.10) p(£) = Min po(£), f p(Up(a | £) da + 


The proof of this theorem is omitted, since it is essentially the name as that of 
Theorem 3.1. 

Theorem 3.3. 7 The following inequalities hold: 

(3.11) 0 % p m (£) — p(£) g — (to » 1, 2, * ■ • , ad inf.) 

7TI 

where Wo is the least upper hound of W(F, d). 

Proof: Let {D,) (i = 1, 2, • • • , ad inf.) be a sequence of decision functions 
such that 


(3.12) lim r(£, Di) = p(£). 

Let, furthermore, P ,(£) denote the probability that at least m observations will 
be made when D, is the decision function adopted and £ is the a priori probability 
measure on ft. Since p(£) g Wo and since 

(3-13) r(£, Di) £ mP,(£), 

it follows from (3.12) that 


(3-H) lim Bup P,(£) g . 

TO 

Let LC be the decision function obtained from D, as follows: 1)7(x) =• Dfx) 

for all x for which n(x, Di) S m. D"(x) is equal to a fixed element do for all * for 
which n(x, Di) > m. 

Clearly, 

(3 ' 15 ) Kf, d>T) ^ r(* t 2?,) + P,(£)]F a . 

From (3.12), (3.14) and (3.15) it follows that 


(3.16) 


lim sup r(£, D7) g p(?) + -- 0 . 

TO 


Pnri C T 0t J5® left hand member of (3.16), the second half of 
(3.11) follows from (3.16). The first half of (3.11) is obvious. 

I«r hi9 tlleorem ia essentially the same as Lemma 2,1 in 101 

(f, ?sulh f th^^ ( p"t!;< meMU h rable ( * ) '“ follow the set V of couples 

v v V ik " ^ 1 , (2)) < where c » some real constant. Wo want to show that 

Hr rr ? r* t the aet of coupies (f - •> ■»* ««r S < c £ 

V V th ^ 8et ° f 2 8 8Uch tbat A) 5 m, Then F,< B, roT^J£T 

the Lt of /"a suclfhaT W(F *) <7^7° 'g? -?(*>, d*)Ut Vi be 

V„V, + X F s , so that v Ik Th ^ * H by A ® sunl Ption 3. Finally we have V ~ 



BAYES SOLUTIONS 


89 


The immediate consequence of Theorem 3 3 is the relation 9 

(3.17) lim Pm (£) = P (f). 

m — oa 

Theorem 3.4. If £i and £ 2 arc two probability measures on 0 such that 10 

(3.18) g 1 + e for all w, 
then 

(3.19) p(£0 S (1 + *)p(fc). 

Proof: It follows from (3.18) that 

(3.20) r(£t, D) S (1 + «M& , D) for all D. 

Hence, (3.19) must hold. 

The above theorem permits the computation of a simple and in many cases 
useful lower bound of / p(£o)p(a | £) da as follows: 

J— GO 

For any real value a, let e a be a non-negative value (not necessarily finite) de¬ 
termined such that 

(3 21) £ 1 + *. for all ». 

ta(<0) 

Then 


(3.22) [* p({„) via | f) da * f p( a | £) da =p(f) f da. 

J— oo « 1 -f- €a J—V3 I -p € 0 


Since e a ^ 0 and since po(£) ^ p(£), we obviously have 


(3.23) 

p(£) da - p(£) "[ X -. 

t:n 14®- 

Hence, 

we obtain the inequality 


(3.24) 

. to 

I p(£a) via | f) da ^ p(f) - po(f) 

1- 

J-M 1 *4“ Jj 


An upper bound of the left hand member in (3.24) is obtained by replacing 
p by po ; i-e., 


(3.25) 


f p(£ a )p(a | £) da g f po(£ 0 )p(a | £) da. 

oo J—oo 


* A proof of (3,17) is contained implioitly in the work of Arrow, Blackwell and Girsbiok 
([2], Seotion 1.3), 

10 The left member of (3.18) ie defined to be equal to 1 when {,(“) 05 h(a») = 0. 



90 


A. WALD AND J. WOLEOWITZ 


The bounds given in (3.24) and (3.25) may be useful in constructing Hayes 
solutions, since the following theorem holds: 

Theorem 3.5. If 

(3.26) Po(S) > f Po(Sa)p(a | {) da + 1, 

J— to 

then p({) < po(0- If 
(3 27) 

then p(|) = *»(£). 

The above theorem is an immediate consequence of (3,10), (3.24) and (3.25), 
A decision procedure relative to a given a priori probability measure; £ 0 will 
be given with the help of the function p(£) as follows: If p(£o) = po(£o), take a 
final decision d for which FF(£o, d) is minimized. If p(£ 0 ) < pa(£o), take an observa¬ 
tion on Xi and compute the a posteriori probability measure hi ■ If p(£i) -~ 
po(£ 0, stop experimentation with a final decision d for which W(£i , d) is minim¬ 
ized If p(£i) < po(£i), take an observation on X 2 and compute the a posteriori 
probability measure £ s corresponding to the observed values of A r v and X t , and 
so on. The above decision procedure will be shown later to be a Bayes solution. 
Theorem 3 5 permits one to decide whether p(£) < pa(£) or * po(f) whenever 
£ satisfies (3.26) or (3.27). Theorem 3.5 will be useful when the class of all £’« 
for which neither (3.26) nor (3.27) holds is small. 

For the purposes of the next theorem let t) designate the decision procedure 
described in the preceding paragraph. (We shall shortly show that f) is a decision 
function in the sense of our definition.) 

Let Q be the decision procedure where the first observation is taken and then 
one proceeds according to t). 

We shall now prove that t) and £)* are Bayes solutions. More precisely, we 
shall prove the following theorem: 11 

Theorem 3.6. For any £, t) and JD° as defined above are decision functions. 
Lei D be any decision function for which n(x, D) i 1 and let 

P*(£) = Inf r(£, D). 

D 

Then 


r(£, t>) = p(£) 

and 


- p*(f). 


Thlr^?4°aada3! l0 m &1 ? h t T r B0 ™ morc general exi8teuoe theorems <|0], 

11 Tta « Ti ""» 3 »«• 



HAYES SOLUTIONS 


91 


In view of this theorem, the operation "infimum with respect to D” in the. 
definitions of p(£), and p*(£) can he replaced by “minimum with respect to D.” 

First we shall establish the measurability properties of £> and If. Since the 

proofs are similar, we restrict ourselves to consideration of A Let £*,.be 

the a posteriori distribution (3.1). From the ( B) measurability of po(£x„ 
and p(£* t , it follows easily that nix, D) is measurable ( B ). It remains to 
prove that W(F, f)(x)) is measurable (K). For this purpose, let V = (dl , • • • , d*,) 


be a sequence -r dense in D*, i.e., for any d e D* there exists a g t D* such that 
i 


g e V and I W(F, d) — WiF, g) | < - uniformly in F. (The existence of such 

i 

a sequence follows from Assumption 5.) Let now D,(x) be a decision function 
defined as follows: 


n(x, A) - n(x, &). 

Suppose n{x, £>) = m when the observations are aq , • • • , x m . We define D,{x) 
to be such that D,ix) is an element of V and 


(3.28) WiU , D,(x)) = Min WiU , d), 


i.e., D,ix) takes the mipimizing value of d. For any fixed d, the Bet of x’s satisfying 
the equation D,(x) ~ d is without difficulty shown to be (B) measurable. Since 
A( x) assumes only a finite number of values in D*, it follows from Assumption 3 
that WiF, D,ix)) is measurable iK). Now 

lim W(F, Di {*)) = W{F, t>ix)), 

so that WiF, fd(x)) is measurable (K). 

We shall now prove that t> is a Bayes solution, i.e., that 

(3.29) p(£) = r($, D). 

In a similar way it can be proved that 

(3.30) P*(« = r(f, jOfy 

If pa(£) = pit), there can be no better decision function (from the point of 
view of reducing the risk) than t), i.e., D is a Bayes solution. Suppose then that 

(3.31) po(« > p(0. 

If (3.31) holds and f) is not a Bayes solution, there exists a decision'function 
A such that 

(3.32) r(f, A) < rft, t» 
and 


r% A) < 


po(£) + p(£) 


(3.33) 


2 



92 


A. WALD AND J. WOLFOWITZ 


Now A must require that at least one observation be taken, else (3.3d) could 
not hold. Thus t> and A both require at least one observation 
Suppose one observation is taken. Let r(£, D | a) denote the conditional risk 
of proceeding according to D when £ is the a priori distribution and n is the 
first observation. For a given D we have that r(f, D \ a) is a function only of 
£„. In particular r(£, £) | a) and r(£, A | a) are functions only of £ n . 

We can now apply to r(£, D | a) and r(£, A | a) the same argument that 
was applied above to r(f, f>) and r(£, A), and conclude again as follows: when¬ 
ever po(£„) = p(£ a ) (when one takes no more observations according to /)), 
taking additional observations cannot diminish the conditional risk below 
r(£, t> | a) (A may require an additional observation without having 

r(£, A | a) > r(£, £> | a). 

This can happen when po(£ a ) = p*(£<0)- Whenever po(£ 0 ) > p(£ 0 ) (when f) re¬ 
quires us to take another observation) two cases may occur: either a) A requires 
us to take another observation, in which case its decision is the, same as that of 
b, or b) A requires us to stop taking observations. There exists then another 
decision function whose conditional risk is less than 

po(£a) + p($a) , , 

2 

Both this decision function and D require that another observation be taken. 
We conclude that up to and including the first observation, I) coincides either 
with A or with another decision function A whose risk is not greater than that 

of A. 

We continue in this manner for 2, 3, • • • observations. The above argument is 
always valid because of Assumption 4 and because the past history of the process 
(the sequence of observations) enters only through the a posteriori probability. 
Thus we conclude that for any positive integer k there exists a decision function 
A such that up to and including the fc-th observation t> gives the same decision 
as A and the risk corresponding to A does not exceed the risk corresponding 
to A . Since limj:-.* r(£, A) ^ r(£, A, (3.32) cannot hold. Hence (3.29) holds and 
t> is a Bayes solution. 

For any probability measure £ on B one of the following three conditions 
must hold: 

(1) Mind W(£, d) < r(£, D) for any D for which n(x, D) £ 1, 

(2) Mind TF(£, d) g r(£, D) for ali D for which n(x, D) £ 1, and the equality 
sign holds for at least one D with n(x, D) ;> 1. 

(3) There exists a D with nix , D) £ 1 such that Min,, W(£, d) > r(£, D), 

In view of Theorem 3.6, the conditions (1), (2) and (3) cun be expressed by: 
(1) Po(£) < p*(£), (2) po(£) = p*(£) and (3) po(£) > p*{£), respectively. 

We shall say that a probability measure £ on B is of the first type if it satisfies 
(1), of the second type if it satisfies (2), and of the third type if it satisfies (3). 
Since the a posteriori probability defined in (3.1) is also a probability measure 



BAYES SOLUTIONS 


93 


on ft, any a posteriori probability measure will be one of the three types men¬ 
tioned above. 

We shall now prove the following characterization theorem: 

Theorem 3.7. 12 A necessary and sufficient condition for a decision function 
d = Dq(x) to be a Bayes solution relative to a given a prion distribution fa is that 
the following three relations be fulfilled for any sample point x, except perhaps 
on a set whose probability measure is zero when fa is the a priori distribution in ft: 

(a) For any m < n(x, Do), the a posteriori distribution £(w | fa , x\ , ■ ■ • , x m ) is 
either of the second or of the third type, 

(b) For m = n(x, Do), the a posteriori distribution £(« | fa , *i, • ■ ■ , x m ) is either 
of the first or the second type, 

(c) For m = n(x, Da), we have 

Min , d) = W(fa, ., D 0 (x)) 

d 

where £„. Zm stands for an a priori distribution that is equal to the a posteriori 

distribution corresponding to fa , xi, • ■ ■ , x m . 

Proof: We shall omit the proof of the sufficiency of the conditions (a), (b) 
and (c), since it is essentially the same as that of Theorem 3.6. To prove the 
necessity of these conditions, let d = D 0 ( x) be a decision function and let M* 
denote the set of all sample points x for which at least one of the relations (a), 
(b) and (c) is violated. First, we shall show tht M* is a set measurable (B). 
Let M\ be the set of all ar’s for which (a) is violated, M* the set of all x’a for 
which (b) is violated, and the set of all cc’s for which (c) is violated. Clearly. 
M* is shown to be measurable (B) if we can show that M*(i — 1, 2, 3) is meas¬ 
urable (B). Let ikf*(r = 1,2, • ■ • , ad inf) denote the subset of M* for which 
the first violation of the corresponding condition occurB for the sample Xi , «• * , x„ 
We merely have to show that M* r is measurable (B) for all i and r. The meas¬ 
urability of M% r follows from the fact that Mind W(£* lM , , Ir , d) and 

W[fa h .„ r , D 0 (x)} 

are functions of x measurable (B). To show the measurability of M* r and M% r , it 
is sufficient to show that the set of all samples xi, ■ ■ , x r for which fa„ is 
of type i(i = 1, 2, 3) is measurable (B). But this follows from the fact that 

Po(£xj.* r ) and p*(f*are functions of (xi, ■ , x r ) measurable (B). Hence, 

M* is proved to be measurable (B). 

For any x in M* let m(x) be the smallest positive integer such that at least 
one of the relations (a), (b) and (c) is violated for the finite sample 


) %2 > ' j ^m(x) ■ 

Clearly, if a: is a point in M*, then also any sample point y is in M* for which 
2/i = Xi, ■ • • , 2/md) = £m<» • Let x° be any particular sample point in M* and 
let r(fa , Do, x[, • • • , denote the conditional risk when fa is the a priori 

i* See alao the proof of Lemma 1 in [41. 



94 


A. WALD AND J. WOLFOWITZ 


distribution in Q, A is the decision function adopted and the first m(x°) observa¬ 
tions are equal to £i , • • ■ , xla 11 ) , respectively; i.e., r(fa, Do , > '' ' > -Tmu 0 )) 

is the conditional expected value of IF (F, Do(x)) -f- n(x, Da), when fa is ^ the 
a priori distribution in ft, Do is the decision function adopted and .r a , 
are the first m(x°) observations. 

Let Di(x) be the decision function determined as follows: for any x not in 
M* we put Di(x) = Do{x). For any x in M*, let n(x i, A) be equal to the smallest 
integer n(®) ^ m(x) for which 

Po(£*li. .*»<*)) = P(f*I.*«(»)) 

and the value of Di(x ) is determined so that condition (c) of our theorem is 
fulfilled. Since, for any positive integer m, the subset of M* where m(r) — m 
is (B) measurable, A (x) has the proper measurability properties. Applying 
Theorem 3.6, we see that 

(3.34) r(fa , Dy ,ii, • • • , Iib(»j) = 

for any x in M*. On the other hand, since D 0 violates at least one of tin* condi¬ 
tions (a), (b), and (c) at every point x in M*, we have 

(3.35) r(fa , Do , Xl , , X m (x)) ^ p(£* |.*»,(*)) 

for every x in M*. If the probability measure of M* is positive when fa is the 
a priori probability measure, the above two relations imply that 

r(fa, Do) > r(fa , Di), 

Thus, Do is not a Bayes solution and the proof of Theorem 3.7 is complete. 
We shall now prove the following continuity theorem. 13 
Theorem 3.8. Let {£,■) (i = 0, 1, 2, • - • , ad inf.) be a sequence of probability 
measures on SI such that 

(3.36) lim = 1 uniformly in «. 

«-« «o(A 

Then 

(3.37) lim p(£<) = p (fa), 

*-« 

Proof: It follows from (3.36) that for any e > 0, we have for almost all 
values i 

(3,8) |g< 1 + .and|W< 1 + (ta al,„. 

Our theorem is an immediate consequence of (3.38) and Theorem 3.4, 


8.r.t.. p , "" e ° ™ by g - w - Bro ' n * ni '■ i ” i “ dc<i >» w- 



BAYES SOLUTIONS 


95 


A stronger continuity theorem is the following: 

Theorem 3.8.1. Let {£,), (t = 0, 1, 2, , ad mf) be a sequence of probability 

measures on Q, such that 


lim £,(w) — £o(<i>) 

uniformly in w. Then (3.37) holds. 

Proof: It follows from (3.11) that 

lim p m (£) = p(£) 

■m-toa 

uniformly in £. Hence it is .sufficient to prove that, under the conditions of the 
theorem, 

lim p m ({.) = p m (£o) 

for any m. Let D m (x) denote a decision function for which n (x , D m ) g m for 
all x. It follows that, for a fixed m, r(F, D m ) is bounded, uniformly in F and D m 
(Assumptions 3 and 4). From the hypothesis on {£,} it then follows that 

lim r(£., D m ) = r(£,, D m ) 

uniformly in D m . From this the desired result follows readily. 

A class C of probability measures £ on 0 will be said to be convex if for any 
two elements £i and £ 2 of C and for any positive value X < 1, the probability 
measure £ = Xfr 4* (1 — X) £ a is an element of C. 

For any element do of D, let C\,d 0 denote the class of all probability measures 
£ of type i (i — 1,2, 3) for which 1F(£, do) = Min TF(£, d). Let Cd denote the 

d 

set-theoretical sum of Ci. d and (7 a ,d . We shall now prove the following theorem. 
Theorem 3.9. For any element d, the classes Ci.d and C d are convex . 

Let £i and £ a be two elements of Ci.d. Then for any decision function D(x) 
which requires at least one observation we have 

(3 39) W(6i, d) < r(£i, D) and W(£ a , d) < r(£ a , d). 

Let £ = X£i + (1 — X) £ 2 where X is a positive number <1, Clearly, 

(3.40) W(£, d) = XW(£i , d) + (1 - X) Wfa , d) 
and 

(3.41) r(£, D) = Xr(£i, D) + (1 - X) r(£ a , D). 

From (3.39), (3.40) and (3.41) we obtain 

(3.42) W(£, d) < r(£, D) and W(£, d) = Min W(£, d*). 

d* 

Hence £ is an element of Ci, d and the convexity of C\,j is proved. The convexity 
of Cd can be proved in the same way by replacing < by g in (3.39) and (3.42). 


14 See also Lemma 2 in [4], 



96 


A. WALD AND J. W0LF0W1TZ 


We shall say that a set L of probability measures £ is a linear manifold jf 
for any two elements £1 and £ a of L, £ = ah + (1 — «) £ 3 is also an element, of 
L for any real value a for which a£i + (1 - a) £s is a probability measure, A 
linear manifold L will be said to be tangent to C d if the intersection ( ,f j t flIlf j 
C 2 ,d is not empty, but the intersection of L and G\.<t is empty 

For any decision function D{x) and for any element d of D*, let L(I), d) 
denote the linear manifold consisting of all £ which satisfy the equation 

(3.43) W(£, d) = r(f, D). 

Theorem 3.10. Let £<> be an element of C 2 ,d and lei 7) 0 (.r) be a dm men function 
that requires at least one observation and is such that W% , d) ■= r(fa , // e) ) Thru 
the linear manifold L(D 0 , d) is tangent to Cj. 

Proof : £ 0 is obviously an element of L(D 0 , d). Thus the intersection of L (/;„ rl) 
and C yi is not empty. For any element h of C ltd we have Jpff, , d) r( ^ 
for any D that requires at least one observation. Hence, W(h , d) < r(£, I) ) 
and, therefore, h cannot be an element of L(D 0 , d). This proves’our Ihruiim. ° 


4 . Applications to the case where 12 and D* are finite. In this section we shall 
apply the general results of the preceding section to the following special ease - the 
space 12 consists of a finite number of elements, p h (nay), and the space 

L consists of the elements d, , ■ ■ ■ , d k where d t denotes the decision to arcept 
the hypothesis H x that F ,• js the true distribution. Let 


W(F<, d,) - Wi, = 0 for i = j and > 0 for t yt j. 


(4.1) 

1 Sul?'!" h " 2 “ d k ' 3 ' *"« Moneion lo 

> 3 will be obvious. We shall first consider the case k - 2. In this caw anv 

a prion distribution £ is represented by two numbers g x and p s when- g \ Z 

LITZtt y “;■ \ true (i - *- Th “=. »< s o L gi + e ,J x 

Uth denote the a prion distribution corresponding to g< ~ 1 (£ « \ J. t lcurlv 

3 9 Sft? f c ”d l c'?■ 7 te r E - but t ■• Because 

h'L 37, C* and ft, are closed and convex. Furthermore, we „l,vi„ u> l y 

(4 2) 
and 


<hW n g gi W n for all £ in C dl 


fcWsi £ piWu for all £ in C d} , 

Let £ 0 = (g x , g 2 ) be the a priori distribution for which 
(4-4) fhW 21 = g\W n . 

e» I ,u“r.' r0m <4 2> “ d (4 ' 3) ‘ hit ‘ h ™ «* *» Po^ive number. , and 

(4.5) 


0 < c ' S g c" < i 



BAYES SOLUTIONS 


97 


and such that the class C, dl consists of all £ for which g% ^ c', and the class Cd, 
consists of all £ for which g 2 S c". 

Thus, the following decision procedure will be a Bayes solution relative to 
the a priori distribution £ = (gi, gi ): If g 2 g c' or S c", do not take any observations 
and make the corresponding final decision. If c' < g 2 < c", continue taking observa¬ 
tions until the a posteriori probability of Hi is either i c" or g c'. If this a 
posteriori probability is =2 c", accept Hi , and if it is 5S c', accept Hi. 

The a posteriori probability of Hi after the first m observations have been 
made is given by 

, n _ gip(xi 1 Fi) ••• p(s m | F t ) _ 

C ' 92m gMxi | Fi) • • • p(x m I Fi) + gMx 1 1 Fi) ■ ■ ■ p(a: m | Fi)' 

If c' < Qi < c" and if the probability (under Fi as well as under Fi) is zero that 
g 2m — c' or = c" for some m, then it follows from Theorem 3.8 that the above 
described Bayes solution is essentially unique; ie., any other Bayes solution 
can differ from the one given above only on a set whose probability measure 
is zero under both Fi and P 2 . 

Provided that at least one observation is made, one can easily verify that the 
above described Bayes solution is identical with a sequential probability ratio 
test for testing Hi against Hi . The sequential probability ratio test is defined 
as follows (see [3]): Two positive constants A and B (B < A) are chosen. Ex¬ 
perimentation is continued as long as the probability ratio 

(4.7) Flat = pfa I • • • P&m I Ft) 

Vim p(x 1 1 El) ■ ■ • p(x m I Fi) 

satisfies the inequality B < — < A. If — ^ A, accept H 2 . If — g B, accept 

Vim Vim pirn 

Hi . The Bayes solution described above coincides with this probability ratio 
test for properly chosen values of the constants A and B 

The results described above for k = 2 are essentially the same as those con¬ 
tained in Lemmas 1 and 2 of an earlier publication [4] of the authors 

We shall now discuss the case k = 3. Any a priori distribution £ can be repre¬ 
sented by a point with the barycentrie coordinates gi , g 2 and g 3 , where g, is 
the a priori probability of Hfi = 1, 2, 3). The totality of all possible a prion 
distributions £ will fill out the triangle T with the vertices Oi, On and O 3 where 
0, represents the a priori distribution corresponding to g, = 1 (see Figure 1). 

Clearly, the vertex 0 ,- is contained in C dt . Thus, because of Theorem 3.9, 
Cd, (i =1,2, 3) is a convex subset of T containing the vertex 0;, as indicated 
in Figure 1. 

If one of the components of £ = (gi, Qi , gi) is zero, say g, = 0, then H, can 
be disregarded and the problem of constructing Bayes solutions reduces to the 
previously considered case where k — 2 . Thus, in particular, the determination 
of the boundary points Pi, P 2 , ■ • ■ , P 6 of C dl , C dl and C dl which are on the 
boundary of the triangle T, reduces to the previously considered case, k = 2. 



98 


A. WAU> AND J. WOLFOWm 


It follows from Theorems 3.8 and 3.9 that the intersection of Cj, with any 
straight line T, through 0, is a closed segment. One endpoint of this segment 
is, of course, 0,. Let B, denote the other endpoint. It follows from Theorem 
3.7 that B, must be a point of C 2idl . Any interior point of OJi, can be shown 
to be an element of Ci , d ,. The proof of this is very similar to that, of Theorem 
3.9 

We shall now show how tangents to the sets C dl , C d , and C di can lie con- 
structed at the boundary points Pi, Pi , • * • , P* . Consider, for example, the, 
boundary point Pi of C dl that lieson the line 0i 0*. Let L be the a priori distribu¬ 
tion represented by the point Pi. Since the a priori probability of //» is ssero 
according to &, we can disregard in constructing Bayes solutions relative 
to in . Let Di(x) be a sequential probability ratio test for testing Hi against H» 


0, 



Fid. 1 


which requires, at least one observation and which is a Bayes solution relative 

to L . Since is a boundary point, such a decision function £, exists. Thus, we 
have ' 


(4.8) 


^(ti , di) = rfo , Di) - Inf r(f t , D). 


et «,•, denote the probability of accepting H } when H< is true and D t is the 

adopt , ed : Let > furtherm ore, n t denote the expected number 

is SL tT T by th0 decision procedure is true and D x 

dopted Then, for any a prion distribution * - fa , Qi , g a ) W ft have 

^0 = 23 giWijctij -f- 23 Pi nj 

*t7 i 


and 

(4.10) 


^(€, <*i) =Ej(F,i, 

t 


HAYES SOLUTIONS 


99 


Thus, the linear manifold X(A , di) is simply the straight line given by the 
equation 

(4.11) + 

t ll] i 

This straight line goes through Pi and, because of Theorem 3.10, it is tangent 
to Ci { . Tangents at the same points P a can be constructed in a similar 
way. 

The convexity properties of the sets Cdfi = 1,2, • ■, k) were established by 
the authors prior to the more general results described in Sections 2 and 3 and 
were stated by one of the authors in an address given at the Berkeley meeting 
of the Institute of Mathematical Statistics, June, 1948. More general results 
when Q and D* are finite, admitting also non-linear cost functions, were obtained 
later by Arrow, Blackwell and Girshick [2], 

REFERENCES 

[11 A. Wald, “Foundations of a general theory of sequential decision functions," Eco - 
nominees, Vo], 15 (1947), pp, 279-313, 

[2] K. J. Arrow, D. Blackwell, M. A. Girshick, “Bayes and minimax solutions of sequen¬ 

tial decision problems," Econometrica, Vol, 17 (1949), pp. 213-244. 

[3] A. Wald, “Sequential analysis ," John Wiley & Sons, New York, 1947, 

[4] A. Wald and J. Wolfowitz, "Optimum character of the sequential probability ratio 

teat," Annals of Math , Slat., Vol. 19 (1948), pp. 326-339. 

[5J S. Saks, “Theory of the integral ," Hafner Publishing Company, New York. 

[8] A. Wald, “Statistical decision functions," Annals of Math, Slat, Vo], 20 (1949), pp 
165-205. 



ON THE DISTRIBUTIONS OE MIDRANGE AND SEMI-RANGE IN 
SAMPLES FROM A NORMAL POPULATION 

K. C. S. PlIjLAI 

University of Travancore, Trivandrum. 

1. Summary. In this paper the simultaneous distribution of midrange and 
semi-range has been obtained and used to derive the distributions of midrange 
and semi-range in samples taken from a normal population. 

2. Introduction. The concept of ordering a sample has given rise to innumer¬ 
able problems for statistical investigation. Several authors have contributed to 
the study of ordered individuals and, in particular, to the study of extreme' indi¬ 
viduals, their sum and difference in samples from a normal population. I>. H. (\ 
Tippett [1] has studied the first four moments of the range and hua tabled t he 
mean-range for sample size ranging from two to thousand Student 12] has 
determined the nature of the distribution of range for particular sample sizes 
by purely empirical methods. T, Hojo [3] has compared the standard error of 
midrange to that of median and mean in normal samples. E, S. Pearson and 
H. 0 Hartley [4] have tabled the values of the probability integral of range 
for sample size up to twenty. E. J. Gumbd [5], [0], [7] has established the inde¬ 
pendence of the extreme values in large samples from population of unlimited 
range and obtained the distributions of range and midrange,, The asymptotic 
distribution of range has also been investigated by G. Elfving [8], J. P, Daly [0] 
has devised a t -test adopting range in place of standard deviation in Student’s ( 
and in a modified (-test E Lord [10] has used range instead of standard devia¬ 
tion. An extension to two populations of an analogue of Student’s (-test using the 
sample range has been worked out by John E. Walsh [II]. S. S. Wilks [12] haa 
given a complete and detailed account of the researches on order statistics and 
also a number of suggestions regarding possibilities of utilising order statistics 
in statistical inference, In this paper the distribution of midrange has been 
developed as a series and a method of evaluating the probability integral for 
semi-range based on an infinite series expansion for the normal probability 
integral has been suggested. 


3. Distributions of midrange and semi-range. Let 

Xi < a* ■ ■ ■ < x„ 

be an ordered sample from a normal population with zero mean and unit stand- 

d deviation Then the joint distribution of a* and a: n , the lowest and highest 
values respectively, is given by [13], 6 


too 



DISTRIBUTION OF MIDRANGE 


101 


(1) p(x 1} x n ) = [n(n - 1)/2 tt]J ^J e 12/2 dt/y/2r e <l2+ *" )/2 . 

Let 

M = (x i + x*)/2 

and 

IF = (*„ — Xi)/2. 

M is the midrange and W is the semi-range of the sample. From (1) the simul¬ 
taneous distribution of M and W reduces to 

[“ A.M+W "In-2 

(2) p(M, W) = [n(n - ^ e ~‘ S/2 dt/V 2^J . 

It has been shown [14] that if 

r nM+w -it 

(3) F(M, W) = \ e~ lVi dt/V^c , 

[_J M-W 

(4) F(M, W) = <T* ( * ,+r,)/2 |4f + Ai k) M 2 + ■ • • + A[ k) M u +•••], 
where .Ai* 0 coefficient is given by 

2iA\ k) = kA& - kVtyAAl-^W 

(5) 

+ A£?W*/ r(4) + • • +A i t 1) W 2 '~ 1 /T(2i)]. 
Using expansion (4) equation (2) reduces to 

(6) p{M, W) = [n(n - \)/ir\e~ <,ti+vi)n E A l , n ~ 2) M*'. 

V -0 

It is evident that the A 1 s involve terms of the form 

[4>(jV)]'W*e~ mw2 ' 2 

where s, q, m are positive integers and 

4>(W) = V2h r e "' S/2 dL 
Jo 

Integrating (6) with respect to W 

(7) p{M) = [n(n - l)/7r]e“ n " I/2 E S,M 21 

i-0 

where 

(8) -Bo = Vir/2 1(n - 2, 0, 2), 

(9) Si = [(n - 2)/2][VZ/21(n - 2, 0, 2) - I(n - 3, 1, 3)], 

B a = [(» - 2)/2 J r(3)][\/V2 (n - 2)J(n - 2 , 0, 2 ) 



102 


K. C. S, PILLAI 


(10) - (2ft - 5)1 (n - 3, 1, 3) - (l/3)7(n - 3, 3, 3) 

+ *\/2/t (ti — 3)1 (n — 4, 2, 4)] 

where 

(11) l(s, q,m) = V2/r f [<t>(x)Yx q e~ mt * 13 (te. 

Jo 

Using the method of integration by parts, the evaluation of I (it, q, rn) can be 
reduced ultimately to that of I(p, 0, r) and this function for different values 
of p and r is given in Table I. 


TABLE I 

Values of Integrals I (p, 0, r) 1 


p 


1 

r 


2 

4 

a 

8 

1 

0,277,063,21 

0.147,583,62 

0.100,735,97 

0.076,490,19 

2 

0.152,980,4 

0.064,094,20 

0.037,255,93 

0.025,060,53 

3 

0.098,373' 

0.033,453,6 

0.016,808,71 


4 

0.069,10 

0.019,535,1 

0.008,589,57 


5 

0 051,44 

0.012,325,5 



6 

0 039,90 

0 008,223,9 



7 

0 031,94 




8 

0 026,17 





The first five B Coefficients for n ranging from 3 to 10 are tabled below. 

TABLE II 


Values of B Coefficients. 


So 

Bi 

B t 

s, 


0 347,247,25 
0 191,732 
0.123,292 
0.086,60 

0 064,47 
0.050,01 
0,040,03 

0 032,80 

0 040,642,87 

0 058,751 

0.067,184 

0.070,93 

0.072,20 

0.072,09 

0.071,27 

0.069,97 

0.002,772,90 

0 010,906 

0.021,526 

0.033,23 

0.045,65 

0.057,22 

0.068,95 

0.080,31 

0.000,133,80 

0.001,460 

0.004,988 

0.011,20 

0.020,28 

0.032,21 

0.047,01 

0.064,66 

0.000,005, CK) 
0.000,153 

0.000,909 

0.002,97 

0.007,14 

0.014,59 

0.024,98 

0.040,51 


1 The integrals have been evaluated by using (14). 



DISTRIBUTION OP MIDRANGE 


103 


The accuracy obtained by keeping the first five terms in p(M) may be judged 
from the following values of the total probability calculated for small values 
of n. 


TABLE III. 


Total probability keeping the first five terms in p(M) 


Size of sample 

3 

4 

5 

6 

7 

Total probability 







Integrating (6) with respect to M, p(W) may be obtained But p{W) in¬ 
volves integral 4>{W) and to evaluate the integral probability of W expansions 
for 4>{W) and its powers have to be developed. 

Since 4>(W) = V2A T e~“ /s dt = y/2fiW{l - W 2 /6 +■■■), 

Jo 

a convenient expansion is given by 

(12) V^Ar [ W e~‘ i/2 dt = V2faWe- w,,t (1 + a 1 W i + • • • + o.TP 1 *' + • • •) 

Jo 


where a { follows the recurrence relation 

(13) 3(2i + l)a. - a,_i = (-l)73 ,-1 r(i + 1), 

as may be seen by differentiating (12) with respect to W and equating the coeffi¬ 
cient of TP 2 ’ on both sides. Again 

(14) [<t,(W)Y = v/ryv^w’s* 
where 

(15) 5 = 1 + a 2 TP* + a 3 TP 6 + • • • + a.TP 2 ' + • • • 
and 

( 16 ) s’ = i + Atv + id'V + • ■ ■ 

where 

t 

(17) K[ J) = E ’C.slaJ’aJ* ■ • • aJ'/silsjl ••• a, l 


and 


(17a) 

Clearly a,' = 


Si + 2 s 2 + • • + is, — i, 

Sl + »*+■<•+ s, = s. 

In evaluating the AA’s summation with respect to s is first 










104 


K. C. S. PILUA.I 


performed, the values of Si , s 2 , ■ ■ - , s, being obtained so oh to satisfy the rela¬ 
tions (17a); and thereafter the values of the a’s are substituted. It rimy be noted 
that a x = 0 The K coefficients for j up to 8 and i up to 18 are given below. 


TABLE IV 


A°v 3 Coefficients. 


X 

2 

i 3 

4 j 5 

0 011,111,11 
0.022,222,22 
0.033,333,33 
0.044,444,44 
0.055,555,56 

0 066,666,67 

0 077,777,78 

0 088,888,89 

-0.0*35,273,369 
-0,0*70,546,737 
— 0.0*10,582,011 
-0.0*14,109,348 
-0 0*17,636,684 
—0,0*21,164,021 
-0.0*24,691,358 
-0.0*28,218,695 

0.044,091,711 -0 0*17,814,833 
0.0*21,164,021 ~ 0.041,401,493 
0.0*50,264,550 -0 0*28,800,029 
0.0*91,710,758 -0 .0*54,157,091 
0.0*14,550,265 -0 .0*87,292,080 
0.0*21,164,021 -0 .(FI 2,820,080 
0.0*29,012,346 -0.0*17,707,944 
0 0*38,095,238 -0.0*23,373,001 



i 

6 

7 

1 

0.0*10,087,459 

-0 0*38,065,882 

2 

0.0*13,059,860 

-0.0*78,305,957 

3 

0 0*49,870,764 

-0 0*35,414,321 

4 

0.0*12,515,888 

-0.0*96,195,746 

5 

0 0*25,264,163 

-0 0*20,323,918 

6 

0 0*44,603,642 

-0.0*36,960,883 

7 

0.0*71,905,926 

-0 0*60,836,892 

8 

0.040,854,319 

-0.0*93,258,365 


8 


0 0*14,772,299 
0.0 8 57,379,007 
0.0*37,246,865 
0.043,039,809 
0.0*33,614,797 
0.072,070,037 
0 0*13,654,992 
0.0*23,672,301 


IJ 

-0.0'47,7 70,889 
~0.Q» 32,240,004 
”0.0*26,934,251 
”0.0’ 10,793,811 
-0.0'30,234,979 
—0.0 T 08,563,784 
-0.0*13,520,252 
-0.0* 24,174,801 


3 


1 

2 

3 

4 

5 

6 

7 

8 






10 

11 

12 

13 

0 0’*14,640,444 
0.0*48,330,114 
0.0*21,506,514 
0 0 8 10,849,591 
0.0 s 36,260,639 
0 0 8 95,092,297 
0.0*21,247,442 
0.0*42,365,199 

-0.0**40,268,872 
-0.0**91,351,579 
-0.0*44,469,203 
—0.0 lo 87,178,260 
-0 0°32,719,538 
-0,0 9 93,120,388 
-0 0 8 22,112,968 
-0.0 8 46,218,579 

0.0**10,359,029 
0,0**43,595,840 
0.0**96,601,910 
0.0**72,767,557 
0.0**32,219,900 
0.0*10,472,881 
0.0*27,825,332 
0.0*64,147,144 

-0.0 lT lA4,53S,53t> 
-0,0*49,132,452 
-0.0**58,727,028 
-0.0**54,213,617 
-0.0**27,049,719 
-0.0**90,020,717 
—0.0**27,369,553 
-0 0**66,862,484 



DISTRIBUTION OF MIDRANGE 


105 


Using (12) the probability integral for W can be evaluated with the help of 
tables of Incomplete Gamma Functions. 

REFERENCES 

[1] L H C, Tippet, "On the extreme individuals and the range of samples taken from a 

normal population,” Biomelnka, Vol 17 (1925) ,pp 364-387. 

[2] Student, “Errors in routine analysis," Biomelnka, Vol 10 (1927), pp. 161-164. 

[3] T Hojo, "Distribution of the median, quartiles and interquartile distance in samples 

from a normal population," Biomelnka, Vol. 23 (1931), pp 316-360. 

[4] E, 8. Pearson and H. 0 Hartley, “The probability integral of the range in samples 

of n observations from a normal population," Biomelnka, Vol 32 (1942), pp. 
pp, 301-310, 

[5] E, J, Gombel, “Ranges and midranges," The Annals oj Math. Slat., Vol. 15 (1944), 

No. 4, pp 414-422 

[6] E. J. Gtjmbbl, “On the independence of the extremes in a sample,” The Annals of 

Math. Stat, Vol. 17 (1946), No. 1, pp. 78-80. 

[7] E J. Gumbel, “The distribution of the range," The Annals of Math. Stat., Vol. 18 

(1947), No, 3, pp. 384-412. 

[8] G, Elfving, “The asymptotical distribution of range in samples from a normal popu¬ 

lation,” Biomeirika , Vol. 34, (1947), pp 111-119 

[9] J. F. Daly, “On the use of the sample range in an analogue of Student’s /-test," The 

Annals of Math. Slat., Vol 17 (1946), No. 1, pp, 71-74 

[10] E Loro, “The use of range in place of standard deviation in the /-test," Bimelnka, 

Vol 34 (1947), pp. 41-67. 

[11] J E Walsh, "An extension to two populations of an analogue of Student's /-test 

using the sample range," Anno/s of Math. Slat,, Vol. 18 (1947), No. 2, pp. 280-286, 

[12] S S, Wiles, “Order Statistics,"Am Math.Soc Bull, Vol 54 (1948),pp.6-50, 

[131 S, S Wilks, Mathematical Statistics, Princeton University Press, Princeton, 1943, 
p. 90. 

[14] K C. S, Pillai, “A note on ordered samples," Sankhya, Vol. 8 (1948), Part 4, pp. 
375-380, 



THE IMPOSSIBILITY OF CERTAIN SYMMETRICAL BALANCED 
INCOMPLETE BLOCK DESIGNS 

By S. S. Shrikhande 
University of North Carolina 

Introduction and Summary, An arrangement of v variation or treatment,, 1 ; in 
b blocks of size k, (k < v), is known as a balancer! incomplete block design if 
every variety occurs in r blocks and any two varieties occur together in X blocks. 
These parameters obviously satisfy the equations 

(1) bk = vr 

(2) \(c - I) = r(fc - 1). 

Fisher [1] has also proved that the inequality 

(3) b > v, r > k 

must hold. If v, b, r, k and X are positive integers satisfying (1), (2) and (3), 
then a balanced incomplete block design with these parameters possibly exists, 
but the actual existence of a combinatorial solution is not ensured. These condi¬ 
tions are thus necessary but not sufficient for the existence of a design. Fisher 
and Yates in their tables [2] have listed all designs with r < 10 and given com¬ 
binatorial solutions, where known. A balanced incomplete block design in which 
b = i>, and hence r - k ia called a symmetrical balanced incomplete block 
design. The impossibility of the symmetrical designs with parameters t> 22, 

r = k = 7, X = 2 and u = b = 29, r = ife = 8, X = 2 was first demonstrated 
by Hussain [3], [4] essentially by the method of enumeration. The object of the 
present note is to give an alternative simple proof of the impossibility of those 
designs and to show that the only unknown remaining symmetrical design in 
Fisher and Yates’ tables, viz. v = b = 46, r = k = 10, X 33 2, is definitely im¬ 
possible. Symmetrical designs with X < 5, r, k < 20, which are impossible 
combinatorially, are also listed. 

1. A necessary condition for the existence of a symmetrical balanced incom¬ 
plete block design when v is even. 

Theorem 1. A necessary condition for the existence of a symmetrical balanced 
incomplete block design with parameters v, r and X, where v is even, is thal r -~’X 
be a perfect square. 

Troop. Let N = (ntf) be a square matrix of v rows and v columns where 

( 4 ) uu = 1 or 0 

according as the z-th treatment does or does not occur in the y-th block. Put 

(5) B - NN' 

106 



IMPOSSIBILITY OP CERTAIN BLOCK DESIGNS 


107 


Since every treatment occurs in r blocks and every pair of treatments in X 
blocks, we have, if the design is possible, 


( 6 ) 



Subtracting the first column from all the other columns and then adding to 
the first row all the other rows, we see that 


(7) 


| B | = [r + X(y ~ l)](r - X)” -1 
= r 2 (r — X)’ -1 from (2). 


But from (5) 


\B\ = \N\\ 

Since ) N | is integral, it follows that (r — X)’" 1 is the square of an integer, and 
hence if v is even, r — X must be a perfect square. 

Corollary. The following symmetrical designs are impossible. 


(Ai) 

v = b = 22 

(A 2 ) 

v — b — 46 

(A,) 

v = b = 92 

(A*) 

v = b = 106 

(As) 

CM 

£>- 

H 

II 

rO 

II 

(A.) 

u = b = 34 


r = k = 7 X = 2 

r = k = 10 X = 2 

r = A = 14 X = 2 

r = k = 15 X = 2 

r = fc = 19 X = 2 

r = k = 12 X = 4. 


As already mentioned in the introduction, the impossibility of (Ai) has been 
proved by HusBain [3], but for the design (A 2 ) it was hitherto unknown whether 
or not a solution is possible and it was left as a blank in the latest edition of 
Fisher and Yates 1 tables. 


2. Application of method of Bruck and Ryser. 

In a recent paper Bruck and Ryser [5] have proved the impossibility of some 
finite projective planes with the help of the properties of matrices whose ele¬ 
ments are integers. Their method is immediately applicable to our own problem. 

Let A and B be two symmetric matrices of order n with elements in the ra¬ 
tional field. The matrices A and B are congruent, written A ~ B, provided 
there exists a nonsingular matric C with elements in the rational field, such that 
A >= C'BC. The congruence of matrices satisfies the usual requirements of an 
' ‘equals’ ’ relationship. 

If A is an integral symmetric matrix of order n and rank n, we can always 
construct an integral diagonal matrix D = (di, ■ ■ ■ , d n ), where d, ^ 0, i = 1, 
2, • • • n such that D ~ A The number of negative terms i, called the index 
of A, is an invariant by Sylvester’s Law. 



108 


S. S. SHRIKKANDE 


Define d - 1 )'S where 5 is the square-free positive part of J A j. 'I'hen since 

| B | = | C | 2 1 A |, d is another invariant of A 
Now let A be a nonsingular and symmetric integral matrix of order n. Let 
D r be the leading principal minor determinant of order r and suppose that 
D r 7 ^ 0 for r = 1 , 2 , • • • n- Define 

(9) C T {A) - (-1, -Dn), IT (A/, “ />/«h 

i-i 

for every odd prime p where (m, m') p is the Hilbert norm-residue symbol for 
arbitrary non-zero integers m and ml and for every prime p. The following 
two theorems are given in the collected works of Hilbert [0]. 

Theorem (A) If m and in' are integers not divisible by the odd prime p, then 

( 10 ) (m, m% = +1 

(11) (m, v) r = ip, m) P = (m/p), 

where (m/p) is the Legendre symbol. Moreover, if nt rs m! ^ 0 mad p, then 

(12) (m, p) p = (m 1 , p) p . 

Theorem (B). For arbitrary non-zero integers m, in', n, n' and for every prime, p, 

(13) (-in, m) p = +1 

(14) (m, n) r ~ ( n, m) P 

(15) (mm', n) v = (m, n) p (mn) p 

(!6) K nn') P = (m, n) p (m, n’) p . 

From the above it is easy to prove that for p an odd prime and every positive 
integer m, 

(17) 

(18) 


(m, m + 1)„ = (-1, m + l) p 

tn 

n O', J + 1), = ((m + 1)1, - 1), , 


We can now state the fundamental Minkowski-Hasse Theorem [7], 

Theorem (C). Let A and B be two integral symmetric matrices of order n and 
rank n. Suppose further that the leading principal minor determinants of A and B 
are different from zero. Then A ~ B if and only if A and B have the same invari¬ 
ants i, d and C f for every odd prime p, 

3. A necessary condition for the existence of a symmetrical balanced In¬ 
complete block design for any integer v. 

Suppose the symmetrical design with parameters v, r and X exists, Then 
with the previous definition of N and B, 


B = NN' = 


![ * 
X r 


,X X 


r, 



IMPOSSIBILITY OF CERTAIN BLOCK DESIGNS 


109 


Subtracting the last row from the remaining rows and then subtracting the last 
column from all the other columns, we get 


(19) 


2(r — X) (r — X) - ■ • (r — X) — (r — X) 

(r — X) 2 (r — X) ■ ■ • (r — X) — (r — X) 

(r - X) (r - X) • • • 2(r - X) - (r - X) 

— (r — X) — (r — X)-(r — X) r 


Obviously Q ~ B. But B ~ I. Hence Q ~ I and, therefore, since Q and I satisfy 
all the conditions of Theorem C, they must have the same invariants i, d and C P . 
Let D ; denote the leading principal minor determinant of Q of order j. Then 

(20) D, = (r - \) 3 (j + 1) for; = 1, 2, ,v - 1 

(21) and D v = | B | = r(r — X) ” _1 . 

Then, omitting p for convenience, 

C P (Q ) = (-1, -D„)(Z)^i, —Dv) 9 (A, - A+i)- 

i-i 

We use (10) • , (18) in deriving the value of C P (Q). 

Now 

( —1, —A)(A-i, A) 

= (-1, -r\r - - \)-\ ~r\r - X)”" 1 ) 

= (-1, -1)(-1, r s )( —1, (r - Xr 1 )((r~ X)*' 1 , r 2 ) 

((r - X)’- 1 , -(r - xr^ti, r 2 ) (a, - (r - X)’" 1 ) 

= (-1, (r - Xr'Xa, - (r—X)* -1 ) 

= (-1, (r - X)’~ 1 )(y, -1)0, (r- X))’" 1 . 

Also 

3 (A, - A+i) = S ((r - X) J (j- + 1), - (r - \) m (j + 2)) 

i-i i 

= {n ((r - X)', -(r- X)^)0 + 1, -o + 2))| S 

= Sn ((r - X)', -(r - X)0((r - X) y , (r - X))0 + 1 ,J + 2)0' +1,-1) 

1 

= -sH ((r - X), (r - X)Xt7 + 2, -1)0' + 1,-1) 

1 

= STL (r - X, -l)'0 + 2 , -1)0+ 1, -1) 

i 

= S(r - X, - 1) ( ’~ 1K "' 2)/2 ((a -1)1, - l)(»l, -1) 

= Sir - X, -l) ( '- 1)( ’- 2,/2 («, -1), 



110 


S. 8. 8IIB1KIIANDE 


where 

S=n((r-\ni+2)((r-X)' + \j-|~ D 
1 

= n (Cr - X) J ,j + 2)((r - xr\ j +- 1) 

1 

= II((r-X) y ) j + 2)fl((r-X) , ,;-+ 2) 

i J —o 

= (r- X, »)-\ 

C P (Q) = (r ~ X, -I)***" 55 ' 3 (», — l)’(r - X, «) s ’~ 3 
(22) = (r-X, - l)-<'-»' s (r - X, »)*•“*. 

Hence we can enunciate the following theorem: 

Theorem 2. A necessary condition for the existence of a symmetrical balanced 
incomplete block design with parameters v, r and X is that 

CM - (r - X, -1 )V*- 1>/t (r - X, v)V"* - +1 

/or all odd prime p, where (m, n), is the Hilbert norm-residue symbol. 

When v is even we have seen that a necessary condition for the existence of 
the design is that r — X be a perfect square. Then it is easily seen that 

C P (Q) «® +1 

for all odd prime p. Therefore, even if the deBign is really non-exiatent, its im¬ 
possibility cannot be proved by this method. 

When, however, a is odd we can in many instances demonstrate the impossi¬ 
bility of the design. 

Consider the deBign 

(d. 7 ) «v = b = 29, r = k = 8, X ** 2. 

C„(Q) = (6, -1)V u (6, 29)‘ 5 
- (3, 29),(2, 29), 

= (29/3) for p = 3 
= (2/3) for p = 3 
= — 1 for p = 3. 

Hence the design (4 7 ) is impossible. As mentioned in the introduction, the 
impossibility has already been demonstrated by Hussain [4] by a rather lengthy 
method amounting to a complete exhaustion of all possibilities. The following 
designs with X < 5 and r, k < 20 can be similarly proved to be impossible by 
applying Theorem 2. 


(4») 

CO 

1— 1 

11 

11 

r = k - 17 

X = 2 

(A,) 

CO 

11 

rC 

11 

r ®* k = 12 

X - 2 

(•dm) 

v = b = 103 

r = k = 18 

X = 3 

(■dn) 

v = b = 53 

r = k = 13 

X = 3 

(d 12) 

v = b = 43 

r = A = 15 

X = 5 

(d.13) 

v = b = 77 

r = fc = 20 

X = 5 



IMPOSSIBILITY OF CERTAIN BLOCK DESIGNS 


111 


My thanks are due to Professor R. C. Bose under whose guidance this re¬ 
search was carried out. 


REFERENCES 

[1] R. A. Fishee, "An examination of the different possible solutions of a problem m in¬ 

complete blocks,” Annals of Eugenics, Vol 10 (1940), pp. 52-75. 

[2] R. A. Fisher and F, Yates, Statistical Tables for Biological Agricultural and Medical 

Research (1949), Hafner Publishing Company, New York. 

[3] Q. M. Hussain, “Impossibility of the symmetrical incomplete block design with X ■» 2, 

k - 7,” SankhyS, Vol. 7 (1948), pp. 317-322. 

[4] Q. M. Hussain, “Symmetrical incomplete block design with X = 2, k = 8 or 9,” Bulletin 

of the Calcutta Mathematical Society, XXXVII (1945), pp. 115-123. 

[6] R. H. Bbuck and H. J, Rtser, "Non-existence of certain finite projective planeB,” 

Canadian Jour. Math., Vol. 1 (1949), pp. 88-93. 

[0] D. Hilbert, Gesammelte Abhandlungen I (Berlin, 1932), pp. 161-173. 

[7] H. Basse, “tlber die Aquivalenz quadratischer formen im korper rationalen zahlen,* 1 

J. Reine Angew. Math., Vol 152 (1923), pp. 205-224. 



NOTES 

This section is devoted to brief research and expository articles and other short items. 


THE SAMPLING DISTRIBUTION OF THE* RATIO OF TWO RANGES 
FROM INDEPENDENT SAMPLES 1 

Richard F. Link 
University of Oregon 

Let us consider a sample of n ordered observations (xi < Xa < • ■ ■ < x n ) 
drawn from a population with variance a . Let w = (x„ — x t )/a. Let us consider 
the joint sam pling distribution of tur and Wi for two samples, not necessarily the 
same size, drawn from populations with the same variance. If the two samples 
were drawn independently, then the joint sampling distributions of toi and 
may be written as the product of the sampling distributions of w i and u<z . 

If we make the change of variable r — wjw 2 , w 2 = w, and if 10 is integrated 
over its range of definition, the cumulative distribution of the ratio of two ranges 
remains. This may be written as 

(1) F(R) = [ dr [ dw-w-hi(w)-hi(wr), 

JQ wQ 

where h is the pdf for Wi and h,. is the pdf for w 2 . 

To obtain more explicit results, specific distribution functions may he consid¬ 
ered. The following table gives the sampling distribution of the ratio of two 
ranges from independent samples for the indicated density functions/(x). Notice 
that for the normal distribution it was possible to obtain results only for some 
special cases. 

In Table 1 for F(R), w, and w% represent ranges computed from samples of 
size rii and n 2 respectively. 

Notice that formula (1) for F{R) ia equivalent to the following expressions 

Pr(wi/wi < R) = F{R ) = / dvh / dwi h(wf) 'h{vh)> 

Jo Jo 

The region of integration for the last expression is simply the region in the, 
w 2 , wi plane to the right of the line tin = Rto 2 . 

This integration was done numerically. Table 2 gives values of R for all com¬ 
binations of n, and n 2 < 10 and for a = .005, .01, .025, .05, .10 such that 

Pr(W]/w 2 < R) = a 

where roj and w 2 are ranges computed from samples of size n, and n* drawn from 
1 This work was done under contract N6onr-218/xv with the Office of Naval Research. 


112 



RATIO OF TWO RANGES 


113 


normal populations with the same variance. It is believed tha t these values are 
correct to within one place in the last reported figure. 

These tabled values may be used as critical values for testing the hypothesis 
that two independent samples were drawn from normal populations with the 


same variance. This test 

is therefore comparable to the F test. Some sort of 

TABLE 1 

fix) 

HR) = Pr iwJvH < R) 

1 0 < x < 1 

, 1 ) 7 ?"'—T 711 (m + fl(ni - 111 

0 all other x 

[_ni H- 7 i 2 — 2 n\ + 7i2 — 1 

1 Rim — 1)' 
ni + ni 

e~ x 0 < x < oo 

0 x < 0 

f- 1)’+' 

[l+i + d+fl/afi + o 

—j==. — * < x < °c 

v27r 

2 

m m 2, 712 «= 2 - tan -1 R 

T 

6 , R 

7ii = 2, «2 == 3 — tan -1 —- 

* V4-f-3fl 2 

m = 3, n 2 - 2 ^ ^tan -1 -^3 + 4 

711 = 3j 712 = 3 

r K T27r (2 

1 d7- — < — (u tan -1 u — u tan -1 1 >) 

Jo L 2 " - \ r 

1 tPy \ ”1 

-I-;-- (tv tan -1 tv — u tan -1 v) -1-- tan' 1 2 ry ^ 

6 r'(l + r s ) r 1 J 

where 

u = (3(r» + 1)H w «. (7r> + 3)“* 

v - (4r s + 3)“‘ y - (3r 2 + 4)“» 


measure of the relative performance of these two tests seems desirable. An at¬ 
tempt to measure the performance of this test relative to the F test was made by 
comparing the tolerance intervals of the distribution of this ratio with those of 
the F test. 

The length of the interval containing the central 1 — 2a proportion of the 
distribution of F was compared with a similar length for the distribution of 
Wi/Wi for ttj = n 2 = n. The square of the ratio of these lengths will be called , 










114 


BICHARD F. LINK 


TABLE 2 


Pr(^ < Rj = .005 



2 

.0078 

.0052 

.0043 

.0039 

.0038 

0037 

.0036 

.0035 

■ 003;4 

3 

.096 

.071 

.059 

054 

.051 

.048 

.045 

.042 

.041 

4 

.21 

.16 

.14 

.13 

.12 

.12 

.11 

.11 

.10 

5 

.30 

.24 

.22 

20 

.19 

.18 

.18 

.17 

.16 

6 

.38 

.32 

.28 

.26 

.25 

.24 

.23 

.22 

.22 

7 

.44 

.38 

.34 

32 

.30 

.29 

.28 

.27 

.26 

8 

.49 

.43 

.39 

.36 

.35 

.33 

.32 

.31 

.30 

9 

.54 

.47 

.43 

.40 

.38 

37 

.36 

.35 

.34 

10 

.57 

.50 

.46 

.44 

.42 

.40 

.39 

.38 

.37 




Pr{ 

< R 
\Uh t 

! = .01 





n,\ 

2 

3 

4 

5 

6 

7 

8 

9 

10 


.0157 

.0105 

.0080 

.0070 

.0068 

.0066 

.0063 

.0062 

.0061 

.136 

.100 

.084 

.079 

.073 

.069 

.065 

.062 

.060 

.26 

.20 

.18 

.17 

.16 

.16 

.14 

.14 

.13 

.38 

.30 

.26 

.24 

.23 

.22 

.21 

.21 

.20 

.46 

37 

.33 

.31 

.29 

.28 

.27 

.26 

.20 

.53 

.43 

.39 

,36 

.34 

.33 

.32 

.31 

.30 

.59 

.49 

.44 

.41 

.39 

.37 

.36 

.35 

.34 

.64 

.53 

48 

.45 

.43 

.41 

.40 

.39 

.38 

.68— 

.57 

.52 

.49 

.46 

.45 

.43 

.42 

.41 



2 .039 .026 . 019 . 018 .017 . 016 . 016 . 016 . 016 

3 . 217 .160 .137 .124 .116 .107 .102 . 098 . 096 

4 .37 .28 ,26 ,23 .21 .20 .19 .18 .18 

6 . 50 .39 .34 .32 . 30 . 28 . 27 .26 . 26 

6 . 60 .47 .42 . 38 . 36 . 34 . 33 . 32 . 31 

7 .68 .64 .48 .44 .42 .40 .38 ,37 .36 

8 .74 ,59 . 53 . 49 . 46 . 44 . 43 , 42 .41 

9 . 79 . 64 . 57 . 53 . 50 . 48 . 47 . 46 . 44 

10 | .83 .68 . 61 .57 . 54 , 52 , 50 . 49 .48 





RATIO OF TWO RANGES 


115 


TABLE 2 —Continued 


Pr(^ <R) = .05 


s' 

711 

2 

3 

4 

5 

6 

7 

8 

9 

10 

2 

.079 

.052 

.039 

.036 

.034 

.032 

.031 

.030 

.028 

3 

.31 

.23 

.20 

.18 

16 

.15 

.14 

.14 

.13 

4 

.50 

37 

32 

.29 

.27 

.26 

.25 

.24 

.23 

5 

62 

49 

.42 

.40 

.36 

.35 

.33 

.32 

.31 

6 

.74 

.57 

.50 

.46 

.43 

.41 

.40 

.38 

.37 

7 

.80 

.64 

.57 

.52 

.49 

47 

.45 

.44 

.43 

8 

.86 

.70 

.62 

57 

.54 

51 

.50 

.48 

47 

9 

.91 

75 

.67 

.61 

.58 

.55 

53 

52 

.51 

10 

.95 

.80 

.70 

.65 

.61 

.59 

57 

.55 

.54 




Pr( 

< r' 

\Wi ) 

( = .10 





— 

ni 

2 

3 

4 

6 

6 

7 

8 

9 

10 

2 

.158 

.105 

.077 

.074 

.069 

.066 

.062 

.059 

.056 

3 

.46 

.33 

.28 

.25 

.23 

.22 

.21 

.20 

.19 

4 

.67 

.49 

.42 

.38 

.36 

.34 

.32 

.31 

.30 

5 

.84 

.62 

.53 

48 

.45 

.43 

.41 

.39 

.38 

6 

.97 

.72 

.62 

.56 

52 

.50 

.48 

.46 

.45 

7 

1 07 

80 

.69 

.63 

.59 

.56 

.54 

.52 

.50 

8 

1.15 

87 

.75 

68 

.64 

.61 

.68 

.56 

.54 

9 

1 21 

92 

.80 

.73 

.68 

.65 

.62 

.60 

.58 

10 

1.26 

.98 

.85 

.77 

.72 

.68 

.66 

.64 

.62 



TABLE 3 


n 

Relative precision of the range as an 
estimate of a 


2 

1.00 

1.00 

3 

.99 

.99 

4 

.98 

.97 

5 

.96 

.95 

6 

.93 

.92 

7 

,91 

.90 

8 

.89 

.89 

9 

.87 

.88 

10 

.85 

.86 













116 


FRANK J. MASSEY, JH. 


For statistics having normal sampling distributions such a rat io would bn in¬ 
dependent of ol and would be equivalent to the ratio of the variances of these 
sampling distributions. It was found that si is independent of a except for a 
maximum change of 1 in the second decimal for the values of a - .005, ,01, 
.025, .05, .lo! These values of 6 2 are presented in Table 3 along with the relative 
precision of the range as an estimate of c as given by Mostellcr [1]. 

It is interesting to note that 5 s corresponds very closely to the relative precision 
of the range as an estimate of <r. 

REFERENCE 

[1] F, Mostblueb, "On some useful 'inefficient' statistics,” Annals of Math. Utai., Vol. 1? 
(1946), pp. 377-408. 


A NOTE ON THE ESTIMATION OF A DISTRIBUTION FUNCTION BY 

CONFIDENCE LIMITS 

By Frank J. Massey, Jr. 

University of Oregon 

Let F(x ) be the continuous, cumulative distribution function of a random 
variable X, and let xi < x 2 < x> < • • • <x n be the results of n independent 
observations on X arranged in order of size, We wish to estimate F(x) by means 
of the band S*(x) ± X/Vn where S n (x) is defined by 

0 if * < *i, 

S n (x) = k/n ifx k <x< xm , 

1 if a; > x n . 

Thus we wish to know the probability, say P„(X), that the. band is such that 
Sn(x) - < F(x) < S n (x) -f for all a;. This problem has been pre¬ 

viously studied [1] [2] [3] [4] [5] and a limiting distribution has been obtained 
[l] [4] [5l and tabled [31 [4], However apparently no error terms for the limit¬ 
ing distribution, or practical methods of obtaining P„(X) have been given. Such 
a method is given here. 

It has been shown [2] that P„(X) is independent of F(x) provided only that 
F(x) is continuous, and thus it is sufficient to consider only the ease 

0 if x < 0 , 

F(x) = sif0<K<l, 

1 if x > 1. 

We will find the probability that S n (x) falls wholly in the band F(x) ± k/n 
(here X - fc/vn) where k is an integer or a rational number, and intermediate 
values may be obtained by interpolation. To illustrate the method we Bhall 
assume that k is an integer. 



ESTIMATION OP A DISTRIBUTION FUNCTION 


117 


Divide the interval (0, 1) into n parts by the points 1/n, 2/n, • ■ , (n — 1 )/n, 
The step function rises by jumps of exactly 1/n. Thus, in order to be 

inside the band at x — i/n , S n (x) would have to pass through exactly one of the 
lattice points whose ordinates are (i — k + l)/n, (i — k + 2)/n, ■ • • , (i + k — 1 )/n. 
Suppose that the step function stays inside the band by means of a. of the 

observations falling in the interval The a priori 

\ n n/ 

probability of this happening is given by the multinomial law as 


P r(«l ’ • • a„) = 


at 1 aj 


n!_ /A" 1 /l\“ J /lV 

! • • ■ a„! \?i/ \nj \nj 


1 


71! 


ai! ... a n ! n n 

since 2" a, = n. 

Thus the probability of the step function staying in the band is given by 

n! 1_ n! v-> 1 

■■ aj 


p»(x) - E 


n n otil a $! 


_ 71 1 

n n ct\ I 


«n 


where the summation is over all possible combinations of «i , «■ • , ct n such that 
X n 

max 1 S n (x) — x I < ~ t =- and E «* = 

i V7l i-l 

Let UAm) = E» -■., i = 1, 2, ■ • • , 2k — 1 be the sum of all the 

ad 1 • ‘ aJ. 

terms indicated such that S n (x) arrives at the lattice point (—, —----■ ) by 

\n n / 

a route that stays inside the band. Since the S„(x) is non-decreasing it can only 
pass through a point 


c 


l±j) l 


m = 0, 1, ■ • - , 7i — 1; j — 1, 2, • • ■ , 2k — 1, 


m — k + j + 1 
n 


m 4- 1 in — fc + 

71 ’ 71 

if it previously passed through one of the points 

(m m — fc + l\ /mm — fc + 2\ /m 

\7l ’ 71 / ’ \7l ’ n ) ’ \7l ’ 

If it passed through the value of a m+ x would have to be 

(j + 1 - h ) and the product Uh(m) “ would be part of Uj(m + 1)- 

This is tme for all h = 1, 2, • • • , j 4- 1 and all of these terms would give dif¬ 
ferent paths for S n (x) so we have 

I 


j+i 

^ ( ” + 1) -£( j + i-w i 


U h (m), j = 1, 2, • ■ ■ , 2k - 1, 


where it is understood Uh{ m) = QHh'>m + k. 



118 


FRANK J. MASSEY, JR. 


Thus we have a set of 2k - 1 linear homogeneous difference equations. They 
may be reduced to a single difference equation by eliminating 2k — 2 of the 
variables by substitution. This results in the following difference equation. 


X (- If ~- hT h ~ Uk(2k - 1 - h 4* m) « 0. 

*-i ft i 


TABLE 1 


k 

n 

-6 

10 

20 25 30 

35 

40 

45 

1.0 

.0384 

.0004 





1 5 

,3276 

.0449 





2.0 

6521 

.2513 

.0238 




2 5 

.8880 

5139 





3.0 

.9699 

.7331 

.2955 




3.5 

9947 

.8522 





4.0 

.99935 

.9410 

.6473 




5.0 


,9922 

.8624 .7637 ,6629 

.5074 

,4808 

.4042 

6.0 


.9994 

.9569 .9057 .8420 

.7725 

.7016 

.0322 

7.0 



.9892 . 9683 . 9359 

.8945 

.8471 

7902 

8.0 



.9979 9911 .9774 

.9500 

.9295 

8974 

9.0 



.9997 .9979 .0931 

.9842 

9708 

.9529 








k 

n - 50 

55 

60 66 

70 

76 

80 

5 0 

.3377 

.2807 

.2324 .1918 

.1577 

.1294 

.1000 

6.0 

5662 

.5046 

.4478 .3954 

.3492 

.3072 

.2090 

7 0 

.7439 

.6916 

.6403 .5908 

5435 

.4987 

.4580 

8.0 

.8616 

8234 

7837 .7434 

.7031 

.6033 

0244 

9 0 

.9312 

9063 

.8789 .8496 

.8189 

.7874 

.755*1 


. In ^ al conditions on either the simultaneous equations or on the 
tion are 


single equa- 


Ui{ 0) = 0 for i k, 

Uk{ 0) = 1 for i « k. 

After values of U k (n) have been found the value of I\ can be found by 

multiplying Uk(n) by — . 

n" 

values of Uk(n) can be obtained numerically either from the simultaneous 


1 



KS’J 1MATION’ OF A DISTRIBUTION FUNCTION 


119 


equation* or from the single equation. Table 1 was computed partly by numerical 
solution of the simultaneous equations above and partly by setting up similar 
equations connecting U,(x + 5) to U,(x), t = 1, 2, ■ • • , i + 5. Either method 
could he set up on puneh cards if an extensive table was desired. Notice that 
to get lh{n) all 14(f), l « 1 ( 2, «■ ■ , n — 1 are also found. Table 1 gives some 
computed values of PJk). Table 2 gives results interpolated from Table 1, 
showing the apprnaeh of I’ K (\) to its limiting distribution. 

If the width of the hand is 2 (^) when k and l are integers a similar pro¬ 
cedure to that above ran he used. However instead of dividing the interval 
(0, 1) into n parts it is necessary to divide it into l-n parts. 


TABLE 2 


n 

A m 0 

1.0 

1.10 

1.20 

1,30 

1.40 

10 

.00 

.78 

.85 

.91 

.95 

.97 

20 

.05 

.77 

.85 

.91 

.94 

,97 

30 

.65 

.76 

.85 

.90 

.94 

.96 

40 

.04 

.76 

84 

.90 

.94 

.96 

50 

.04 

.75 

.84 




00 

.03 

.75 

.84 




70 1 

.03 

,75 

.83 




80 

.03 

.74 





OO 

.007 

.730 

.822 

.888 

,932 

.960 


It has been suggested (2) that instead of a band bounded by y = % ± c it 
might be convenient to use a band bounded by the lines y = px + q and 
y « p'x- f. q 1 2 * 4 5 . If p » p' and if p, q, q' are rational the probability of S„(x) Btaying 
inside the band can be evaluated by the method presented above. If p ^ p 
and if p, p\ q, q' are all rational a similar procedure could be used but it would 
be very tedious, 

[1] N. Smirnov, "Sur lea hearts de la eourbe de distribution ompirique,” Recueil Math. 

de Moiicou, Vol. 6 (1939), pp. 3-26. 

[2) A. Wald and J. Woi.rowm,''Confidence llmitB for continuous distribution functions,” 

Annals of Math. Slot,, Vol. 10 (1939), pp, 106-118. 

[31 N. Smirnov, "On the estimation of the discrepancy between empirioal curves of dis¬ 
tribution for two Independent samples,” Bulletin Mathematique de I’Univenite 

de Moscou, Vol. 2 (1339), faso, 2. . 

[4] A. Kolmogorov, "Sulla determinasione empirioa di una legge di distribunone, let. 

Ital, oil. Giorn. Vol. 4 (1933), p. 1-11. . „ 

[5] W. Felder, "On the Kolmogorov-Smirnov limit theorems for empirical distribution 

Annals of Math. Slat., Vol. 19 (1948), pp, 177-190. 




120 


FREDERICK HOSTELLER AND JOHN W. TUKEY 


SIGNIFICANCE LEVELS FOR A k-SAMPLE SLIPPAGE TEST 1 
Frederick Mosteller and John W. Tukey 
Harvard, University and Princeton University 

1. Summary. Mosteller has recently [1, 1948] proposed a fc-sample slippage 
test and has given percentage points for selected n, k and r for the case of k 
equal samples of size n. When the samples are of unequal size, exact significance 
levels can be calculated very quickly from 

y\rr> 

P r = -jy: ~ where x w = x(x — 1) ■ • • {x — r + 1), 


by the method explained in section 3 below. 

The significance values for k equal samples of n > 10 are very well approxi¬ 
mated by 

p _ I -f( i~l)(lt-l)/2tf 

r “ jR 6 

where N = kn. 

A convenient rough approximation for unequal wimples may be given in 
terms of k*, an “effective” number of samples, which is given by 

7 r * _ GC - n -’) 2 

the one-sided significance level will then be approximately given by 

P r = (k*y {r ~ l) . 

This approximation can be easily applied with the aid of Table 1. Thus, for 
example, with four samples of sizes 7, 5, 5, 2, we have 

k* = ( 7 + 5 + 5 + 2) s _ 361 
49 + 25 + 25 + 4 103 ~ 

whence from the table r = 3 lies at a one-sided level approximately between 
5% and 10%, r = 4 approximately between 1% and 2.5%, r - 5 between 0.5% 
and 1%, r = 6 near 0.2%, and so on. Direct calculation yields 5.7%, 1.2%, 0.2% 
and 0.03%. The approximation is, in this example, quite satisfactory for moder¬ 
ate significance levels and conservative for more extreme significance levels. 


. 2 ' Derivation - The statistic considered by Mosteller is the number of cases 
m one sample greater than all cases in all the & — 1 other samples, We derive 
its distribution briefly. 

Since the statisti c depends only on the order of the n x + n 2 + • < • + n k = fV 
1 Prepared in connection with research sponsored by the Office of Naval Research. 



KKiNiriCVJH'H LEVELS FOR KLXPI’AGE TEST 


121 


values, we cun consider the actual value's taken on to be fixed, and consider their 
allotment to tin* various samples. Assuming all of them to come from a single 
continuous distribution, we* may consider these fixed values to be all distinct, 
and any way of allotting them to labelled places in the various samples as equally 
likely. 

Consider the r largest values. They can all he, allotted to places in the i-th 
sample in n5 r> =- n,(n, - 1) • • • (?i, — r + 1) ways, and to arbitrary places 
in A r ' rl ways. Thus they will he allotted to some single sample in the fraction 

tj _ x>; r> 

ATM 

of all eases. This is clearly the probability that Mosteller’s statistic is r or more. 

TABLE 1 


Approximate critical values of k* for various levels of significance 


Ono-sidod 
level 
Two Midori 1 
levelj 

vn 

• 20 % 

5% 

10% 

2 5% 

5% 

l 

S § 

i 

0.5% 

1% 

0.2% 

0 4% 

0.1% 

0.2% 

r = 2 1 

10 0 

20.0 

40.0 

100.0 

200.0 

500.0 

1000.0 

O 5 

r raa , 

3 2 

4.5 

6.3 

10 0 

14 1 

22.4 

31 6 

i i 

r «a 4 


2.7 

3.4 

4.6 

5.8 

7 9 

23 0 

r ns 5 ! 



2.5 

3.2 

3.8 

4.7 

5.6 

r » fi J 




2.5 

2.9 

3.5 

4 0 

r ■= 7 






2.8 

3.2 

r ** 8 j 







2 6 


3. Unequal samples -an exact computation. Our practical problem is to 
compute P r for small values of r and a fixed set of n,. If we recognize the nu¬ 
merators as the, unnormalized factorial moments of the distribution of sample 
sizes, we see that the computation goes smoothly according to the scheme shown 
in Table 2 (where the columns of multipliers n — 1, n — 2, n — 3, etc. may be 
partially covered for convenience during the computation.); For example: 
132 « 11(12), 1320 - 10(132), - • ■ 42 « 0(7). The numbers in the last line of 
Tallin 2 give successively the percentages 100 1\ , 100 Pa, • • • .Of course Pi = 
1 because some sample must have the largest value. It is clear that exact com¬ 
putation for any reasonable sot of n< is quite easy. 

4. Equal samples—an approximation. In the case of k equal samples, we have 

p 

1 ' tfCr) • 

Let ys try to approximate to n (r) by expansion in powers. We have 



122 


FREDERICK M03TELLER AND JOHN W. TUKEY 


n M = n(n - 1) •••(»- r -f 1) = n r (l — l/n)(l — 2/n) • • • (1 — (r — 1 )/n), 
so that 

r —1 

log n' r) — r log n -f S log (1 — x/n) 

i^*l 
r—1 

= r log n — 52 ( x /n + x /2ri + x*/3n* • ■ •) 

i-i 

- rlogn - r(r — l)/2n — r(r — 1)(2 r — l)/12n s + 0(7r a ), 

TABLE 2 
Sample Compulation 

for (n,} = (12, 11, 11, 11, 10, 10, 10, 10, 9, 9, 7, 4) 









Ml.MKH'ANi », LEVELS FOR SLIPPAGE TEST 


123 


and finally 
(3) 


P -i }■ f '' u c { ~ r!r_l1 '‘“ 1J/Sw >< 1+tSr_ » /l! ^ 


5. Comparison of results. 'Dip rmilta obtained with various equal Bample 
approximations will be rnmpami with the exact values for several cases. The 
effective number of samples, IP, used with (1), (2), and (3), is computed from 


k* 


' E»T ’ 


a formula which is often an easy and effective way to allow for different sizes of 
samples. 


TABLE 3 

Comparison of Approximations 


1 

N ! 




P 

V in 


Siz es of Samples j 


k r 






! 

i 

i exact 

(1) 

(2) 

(3) 

(4) 

in, io, io, in 1 

40 

4.00 2 

23.08 

25 00 

23.19 

23.13 

<25.00 


i 

3! 

4.85 

6.25 

4 99 

4 80 

<6.25 

7, 5, f), 2 : 

IS) 

3.50*2 

24.56 

28.53 

25.01 

24.82 

<28.53 

j 

! 

1 

3 

5.67 

8.14 

5.48 

5.18 

<8.76 

12, 11, 11, H ' 

; 

| 

| i 






10, 10, 10, 10 

114; 

1 11.46 2 

7.92 

8 73 

7.96 

7 96 

<8.73 

9, a, 7, 4 ; 

, 1 

3| 

0.58 

0.76 

0.58 

0.66 

<0.78 


A fourth approximation, which always gives a conservative estimate of the 
significance of the result is obtained by replacing n (r) by n throughout, this gives 


(4) 


Pr = 


2>i 

N' 


) 


which is equivalent to approximation (1) when the samples are of equal size, or 
when r « 2. 

The results are shown in Table 3. 

Thus it seems clear that either (1) or (4) are good enough for rough work. 
The choice will depend on which formula one prefers to remember. The amount 
of work is about tho same for cither method. When something better is required 
the exact method of section 3 seems appropriate. Indeed some may prefer it to 
any approximation, 

reference 

[1] Frederick Mostbu.br, “A fc-afunple slippage teBt for an extreme population,” Annals 
oj Math. Slat., Vol, 10 (1048), p. 58-«5. 



124 


JACK SHERMAN AND WINIFRED J MORRISON 


ADJUSTMENT OF AN INVERSE MATRIX CORRESPONDING TO A 
CHANGE IN ONE ELEMENT OF A GIVEN MATRIX 

By Jack Sherman and Winifred J. Morrison 
The Texas Company Research Laboratories, Beacon, Now York 

1. Introduction. Many methods have been published in recent years for carry¬ 
ing out the numerical computation of the inverse of a matrix [1], [2J. In all these 
methods, the amount of computation increases rapidly with increase in order 
of the matrix 

The utility of a computational method for obtaining the inverse of a matrix 
would be increased considerably if the inverse could be transformed in a simple 
manner, corresponding to some specified change in the original matrix, thus 
eliminating the necessity of computing the new inverse from the beginning. 
The problem that is considered m the present paper is one of changing one ele¬ 
ment in the original matrix, and of computing the resulting changes in the 
elements of the new inverse directly from those of the old inverse. 


2. Computational method. Let 

a, , ,i ~ 1 , 2 , ■ ■ ■ , n, j = 1 , 2, • • ■ , n denote the elements of an nth order 
square matrix a; 

b, ,, denote the elements of b, the inverse of a; 

A., .denote the elements of A which differs from a only in one element, say Ana) 
B„ , denote the elements of B, the inverse A. 

Let 

Ass = ana + Aa^. 

The set of equations by means of which B may be computed from Aa HR 
and b is 


(1) 


B r j — brj 


brR bgj A Clns 


r = 1, 2, • • • , n, 


1 + b SR Aa H a ’ j = 1, 2, • • , n, 

provided that 1 -f b BR Aa RS ^ 0. 

The validity of equation (1) may be demonstrated by multiplying through 
by A,r, (r = 1, 2, • ■ • , n) and adding the results: 

t A,r Bfj = i A , r b,f - t A„b r n , 

(2) r ~ l r “ 1 1 + A a R H rZi ’ 

(i m 1 , 2 , ... ,n\j » 1 , 2 , , »), 

Consider separately the equations for which i ^ R, and for which i » R 
Case I. i * R. By hypothesis, A„ = a w fort ^ R. Hence equations (2) become 

( 3 ) T A B = T n. h b a ,Aa Ra ^ 

w n ri 2 -/ a„ a,, — -——— ; - 2 -j a, r b rR , 


1 


1 + bsn A r-1 


(i 1,2, ■ • ■ ,R~1, ff-f 1, • ■ •, n; j = 1,2, ■■■ ,n), 



ADJUSTMENT OF AN INVERSE MATRIX 


125 


The last Him vanishes Ivccautic a and b are inverse matrices, and hence 

ft ft 

(4) “ 22 n,f b r> 

r-S r ~1 

(i - 1, 2, • • • , It — 1, R + 1, • * ■ ,n;i = 1,2, • • • ,n). 
('aw* It. i H. Equation (2) becomes 


(5) 22 Ah* Ht. 


22,1 

#«* l 


fir Vrj 


but A +• . 
1 + ban A &h» P“I 


■bm (j= 1> 2, • • • ,n) 


In ciich of the summations, there* will he a term for which r = S, in which 
case* .4»« + A« W a . In all other cases, Ar, - a Rr . Hence (5) can be written as 

n ft 

2« Aar *- 22 n n- br } + A <inxbsj 

r"l r~l 

fp\ 

" ( \ * Y22 UltrbrK + A CIrsIsr] (j = 1) 2, • • ■ , 7l). 

\1 + ha* A <W V-1 / 

Since a and b are inverse matrices, the second summation on the right-hand 
side of ((>) is equal to unity, and hence (0) becomes 

(7) 22 Aar Itr) — 22 b r} (j— 1) 2, , n )* 

i—l r—1 

The acts of equations (4) and (7) can be written as one set of equations: 

(8) 22 A, r Brj — 22 flir b,j (i = 1,2, • • • , n; j = 1, 2, • • ■ , n), 

i—i f-i 

and hence 5 is the inverse of 


3, Illustrative numerical example. In actual applications, equations (1) are 
conveniently subdivided into three groups, namely, those for which r = S, 
those for which j « R, and all others. In the first two cases, these reduce to 


(9) 

B ai - 

= * ** 0 = 1.2,* 

1 + ban A 

* * . n), 

(10) 

Bra = 

= - irK , (r = 1,2, • 

1 + baa A a kb 

• • , n). 

By utilizing (10), (1) becomes 




Brj = b r j — 

B,rtbs)&ciRB, 


(11.) 

(r = 1, 2, ■ 

..., 5 - 1,5+ 1, 



j = 1) 2, ■ 

• ■ • , R — 1, R + 1> * * ■ > n )- 



Equations (10) and (11) show that the elements of B contained in the Sth 
row and 2?th column are directly proportional to the corresponding e emen s o 



126 


JACK SHERMAN AND WINIFRED J. MORRISON 


Consider 


/2.384 

1.238 

0 861 

0.648 

1.113 

0.761 

a 1.119 

0.643 

3.172 

\0.745 

2.137 

1.268 

The inverse of b turns out to be 



2.413\ 

0 137) 

1.139/* 

0.542/ 


/ 0.2220 2.5275 -0.1012 
/ -0.04806 -0.2918 -0.1999 
1 -0.1692 0.01195 0.3656 
\ 0.2801 -2.3517 0.07209 


-1.4145 \ 
0.7079 j 
-0.01824 I ‘ 
1.0409 / 


Assume that au is increased by 0.4, so that 


/2.384 

1.238 

0.861 

A = °' 648 

1.113 

0.761 

1 1.119 

0.643 

3.172 

\0 745 

2.137 

1.268 

Then (9), (10), and (11) become 




B<, m 


hi 


= 16.857 hi O' - 1, 2, , n), 


1 - 2.3517 X 0.4 
B r2 = 16.857 b ri (r - 1, 2, • ■ • , n), 

5 r , = — 0.4 Brfbij (r = 1, 2, • •' , S — 1, <8 H* 1, • • ■ , i%\ 

j = i,2, ••• ,r - i,/e + i, 

Utilization of these equations gives 




B = 


-4.5518 

0.5031 

-0.1919 

4.7218 


42.608 
- 4.9191 
0.2014 
-39.644 


-1.3298 

-0,05805 

0.3598 

1.2153 


-19.155 \ 
2.7560 | 
- 0.1021 ]’ 
17.547 / 


4. Concluding remarks. It is seen from equation (1) that if Aa«« — — 1 /ban , 
that is, if ana is increased by the negative of the reciprocal of the corresponding 
element in the transposed reciprocal matrix, then the denominator in the second 
term on the right-hand side of equation (1) becomes equal to zero, and B cannot 
be found by the present method. It is left to the reader to verify that under 
these conditions A is in fact singular. 

In the illustrative numerical example, the denominator is only 1 — 2.3517 X 

Whlch acc0unts for the large magnitude of some of the elements 
ol i>. If Ao 2 « were taken to be 1/2.3517 = 0.4252 instead of 0.4, A would have 
become singular. 

If two or more elements in the matrix a are to be changed, the new inverse can 
be found by successive applications of the method. 



A CLABS OF RANDOM VARIABLES 


127 


REFERENCES 

[ 1 1 H. Hotelling, "Some new methods in matrix calculation,” Annals of Math Slat., 
Vol. 14 (1943), pp. 1-35. 

[2] P. S. Hwteh, “The solution of simultaneous equations,” Psychometrika, Vol. 6 (1941), 

p. 101. 


A CLASS OF RANDOM VARIABLES WITH DISCRETE DISTRIBUTIONS 

Albert Noack 
Cologne, Germany 

1. General results. A large class of random variables with discrete probability 
distributions can be derived from certain power series. Let 

SO 

/(z) = 22 &.»*, a* real, | z j < r. 

We may have either non-negative coefficients a z or we may have (—l)'a, > 0. 
In the first case take 0 < z < r; and in the second case take —r < z < 0. Define 
a random variable with the distribution 

(X) Pit =as| - jjj'; *-0,1,2,-.. 

The above conditions insure P[£ = x] > 0 for all x; besides 

? P1{ L 

The distribution of £ may be called the power series distribution (p.s.d.). 

The mean of such a distribution is 

P(£) = 22 xP{l- = x} = —12 xa x z x . 


Hence it follows that 


( 2 ) m ) -» = 4d og/(2) - 

We have for the moments about the origin 

y'r - - *1 -/7i5 


z 


dpi _ J_ y -H-i 
dz /(z) . 


a,z* 


,£»i_ 

m m 


22 x r a I z t . 


and hence 



128 


ALBERT NOACK 


Thus we have the recurrence relation 


(3) 


The central moments are 


/ duf , / t 

Mr+1 ~ S ~fe + MlMr • 


Ut = 11 (* “ n'lYPti = 2} = 2 (* - M0 r O*2‘, 


and hence 


d«r 1 V 1 / '\r i ^Ml 1 V" 1 / r—1 „ * 

z-r- - yr^lu *(* - Mi) a* 2 "2f-r Mil «* 2 

dg /(g) X dz Si?) x 

f'(z) 1 v r V 

— 2 V/T • 7T\ 2-/ ~ Mi) 0*1 

/(*) M * 

The sum of the first and third term will be found to be Mr+i , hence 


d/ir 


M 


z & ~ ^ rz d? ^ ’ 

whence we have for the central moments of a p.s.d. the recurrence relation 


(4) 


Mr+l = 2 


<*Mr _L. „ 


Putting r = 1, mo = 1, Mr = 0, we get the variance of £ 

( 5 ) M2 = , 2 (D = Z d f t = g S log/(g) + Ml' “ - 3 2 

By (5), (4) assumes the form 

(4') 

The characteristic function of £ is 


W+*/'« 


dpr i 

Mr-n = 2 — -f- rMz Mi^i ■ 


or 

(6) 


*>«) = 2 e <te P{£ ■= *} = -1 £ a x e' tx z x , 
1 JW * 

d\ _ /(«'*«) 

M '1W 


To get a relation connecting the cumulants k„ and the moments Mr about the 
origin, we differentiate both sides of the identity 

2 *,(*)' = log t^(UY 


•—' i 

-o p! 



A t'l.ANK OF RANDOM VARIABLES 


129 


with respect to (it), identifying coefficients in (it) r 1 we get 1 

(7) Mr “ S(j~ 0 


Differentiation of (7) with respect to z gives 


(7') 




Mr —j 


d,K, 
dz X 


Substitution of (7) and (7 ; ) in (3) gives 


r+1 / 

e(, 

j-l V 


Mr+l—J Kj 


E 

j-i 


}:0f 


(is 


i ' ' 

+ MlMr-j 


. I dx, 

Ki+z ^Tz 


or by (3) after a little re-arrangement 

® - 3 s C=0 -- # - § C - 0 


2, Special cases. 

(a) Choosing f(z ) ■ e, f has Poisson-diatribution 

(la) P(£ = *} « Z -^-. 

xl 

(2) and (5) are the well known relations 22(f) = o-’(f) = z; the recurrence formula 
(4) assumes the form* 

(4a) Mr+i = z J^~- r +, 

(b) Taking /(z) «= (1 - z)'*, ft > 0, 0 < z < 1 we get the so-called negative 
binomial distribution 

(lb) P[t = x) - «*(1 - »)*, * = 0,1,2,.... 

The mean is 

(2W *<e) - r^-,. 

while the recurrence formula for the central moments is 

(4b) Mr+l = a [-gg + (j “ g)l > 

hence the first three moments of this distribution are 

<r(f) = W 

1 Cf. M. G. Kendall, 77ie Advanced Theory of Statistics, Vol. 1, p 87 
j Cf. Craig, Am. Math. Soc. Bull., Vol 40 (1934), p. 262. 



130 


ALBERT NOACK 


(5b) 


_ kzil + z) 

— /i ._ ~\z > 


(1 - *)' 
kz( 1 + 4z + z + 3/cz) 

The characteristic function of the distribution is 


( 6 b) 


<p{ 0 




Writing a = jj/(1 + ij), ft - /i/tj, tj > 0, h > 0 we get the so-called Polyft-Eg- 
genberger distribution for lare contagious events , 


(lbi) w{£ 


r (H( , y 

1 xivihr 1 ) V + vJ 


(l + nT"", * “ 0 , 1 , 2 , 


The first four moments of this distribution are 

( 2 bi) mi = h 

(5bi) Ms = h(l + n) 

Ma = h{l + 17 ) (1 + 257 ) 

M 4 = h( 1 + s?)[l + 3(1 + v)(h + 2??)]. 
To obtain a recurrence relation for the moments consider 


dn r dflr dl\ . d/ir dh _ , \2 

dz dr, dz^ dh dz 1 ^ 1,7 


d/Jr , /t £Mr 

L 6j? *7 dh , 


hence we find for this distribution by (4) and (4b) 

(4bi) Mr+i = (1 + 7 ) |\ + h -f- r/iMr-iJ. 

It follows from (4bi), that Mi is a polynomial in 77 and h. The characteristic func¬ 
tion of this distribution is 


( 6 bi) 


<p(t) = [1 + n(l - s' 1 )] 




(c) The coefficients of the series —log(l — z) = 2Z^-i % / x arc positive; the 
associated distribution derived is 


(lc) P\! = x] = - 

and has the mean 
( 2 c) 


z* 


X log (1 - z) ’ 
E(S) = - 


0 < z < 1 ; ;r ^ 1 , 2 , 


_ (1 “ z) log (1 - z)' 

1 Cf Zeits f. angew Math und Mech., Vol. 3 (1923), p. 279-289. 



A f’l.Wi nr RANDOM variables 


131 


Recurrence formula f 1) has for this distribution the form 
( 4 c) n ,, z ,l > 1 ' ^ ±jw (1 - a) 

U (1 ~* 2) 2 |l0g (1 - Z )Y Mr ~ l j ’ 

while flu' variuiin' and tin* characteristic function of this distribution are 

(5c) - a*ff) - - zS -± 4 f ^iLri£L 

(1 - 0) 5 [tog (1 - a)]* * 

(do) t<t ).. ii? . 

log (1 - z) 

(d) The eneflieienta of the scries log (1 + z)/{ l - z ) = 2 L(2 Jr+ 1 )/(2®+ 1 ) 
arc positive, so we can derive a random variable f with the distribution 

olO! 

(Id) V I? - lir i it .« -ii-— , 

(2* + l)Jog~^ 0 < 2 < !»* ~ 1, 2,3, 
i z 

£ has the mean 


(2-U “ “ —■T+V 

(1-0) log j— 

the recurrence formula (4) assumes the form 

m " M “ ‘\* + 2r ' (1 _ ,y[iog i±ij “I' 

while the variance and the characteristic function of £ are 

(1 + 2 2 ) log J-±-? - 22 


gr +1 “ z ", + 2 r 
<12 


<r 3 (£) — 2? 


1 - 2 


(1 - z) 


if, i + *T ’ 

L tos —«J 


(ad) M - !5ii. 

log (1 + z) - log (1 - 2 ) 

(e) Likewise the coefficients of the series 

. , v' 1 3 - 5 ■ • ■ (2m - 1) z w 

sm 2 » 2 + 2, 2 .4-0* • *(2x) 2x + 1 

are positive, the derived variable £ with the distribution 

P{£ - l! = (sin -1 z)~\ 



132 


ALBERT NOACK 


(le) 


P{£ = 2 a;+l) 


has the mean 


1-3-- *(2x - 1) 
2-4-0-■ -(2.r) 


z !i+i 
2x + "l 


(sin 1 z)~ 


0 < z < 1, x - 1,2, 3, 


) 


( 2 e) 


m) - 


_ z _ 

z* sin ” 1 z ' 


The recurrence formula for the moments 


(4e) 


Mr+1 — 


du r . sin -1 z — z V 1 — z'~ 

-r- + r — r= — »/ ■ - ^rrr ^r-i 
dz Vl - Z 1 (am z) 


gives the variance 
(5e) 


**(*) 


sin ” 1 z — zy /1 — z s 
s/\ — z^Csin ” 1 z) J 


The characteristic function assumes the form 


( 6 e) 


<p(t) = 


sin 1 e‘ z 
sin -1 2 


(f) It is well known, that series (b), (c), (d), and (e) are special caws of the 
hypergeometric function F(a, b, c; z). This function gives a p.a.<l., if a lie > 0, 
If a > 0, b >. 0, c > 0 or if a < 0, b < 0, c > 0, a, b integers, there exiNt no 
further restrictions on these parameters. Suppose a < 0, l> < 0, c > 0, a integer, 
b not, we must have [ 6 ] < a*-, if neither a nor b tun integers, we must have 
[a] = [b]. Suppose a < 0, b > 0 , c < 0, If c is an integer, a must he an integer 
> c. If a is an integer, but c not, we must have [c] < a. Finally if neither a 
nor c are integers, we must have [a] = jc], Corresponding conditions are valid, 
if a > 0 , b < 0, c < 0. Regarding 

— F(a, b ; c; z) = •— F(a + 1, b + 1; c + 1; z), 


the mean of a random variable £ with hypergeometrie distribution ir 

( 2 f) #(£) = g ^ F(a + 1 , b + 1 ; c + 1 ; z) 

c F(a, b; c; a) 

Considering the differential equation 

a(l - z)f'(z) + [c - (a + b + l)z ]f(z) - al>J(z) . 0 , 
(5) gives the variance of £ 


f ■—-{c + [1 - c + (a +'6)z] 

C l — Z \ P(a b: iP 


(5f) 


F(a, bi c; z) 

F{a + 1, b + 1; c + I; z ) _,J 
F(a,b\c\z) 

The higher moments of this distribution can now derived from (4 f ), 

4 it) means as usual the greatest integer <6. 


- 2(1 - Z) ~ 
c 



GEOMETRIC RANGE 


133 


THE GEOMETRIC RANGE FOR DISTRIBUTIONS OF CAUCHY’S TYPE 

By R J. Gumbbl and R. D, Keeney 
A r w York City and Metropolitan Life Insurance Company 

1. Introduction, We consider large samples drawn from a symmetrical un¬ 
limited population whoso distribution is of the Cauchy type, defined by the 
properties 

(1) lim /[I - F(a;)] = A, lim (~x) k F(x) = A, 

Z — M j—*—DO 

where k and A are positive and F(x) stands for the probability function. This 
type of distribution has no moments of an order equal to or greater than k. 
We construct the distribution of a certain function of the extreme values, and 
require only the knowledge of the type of the initial distribution, not of the 
distribution itself. 

From each sample we pick out the largest and smallest observations, x n and 
Xi . If the median of the initial distribution is zero, and the sample size is large 
enough, the probability of any extreme x n or —mi being negative can beneglected. 
If we draw N such samples, each of large size n, we obtain N pairs of extremes, 
x n , r and xi., ( v =■ 1 , 2, 3, ■ > • , N). For each sample we can then compute the 
geometric mean, p, of these extremes: 

(2) p «■ V %n(— Xl), 

which wc henceforth call the geometric range. 

The distribution of these geometric ranges can be obtained directly from the 
joint asymptotic distribution of the extremes. However, it is easier to obtain 
this distribution indirectly from the distribution of the reciprocal of the geometric 
range. This distribution of the reciprocal is of interest in itself: since it possesses 
all moments we can use it to estimate the parameters by the method of moments, 
whereas this problem seems to be very intricate if we start from the distribution 
of tiie geometric range itself. 

2. The distribution of the reciprocal of the geometric range. The distribu¬ 
tion of the reciprocal of the geometric range follows from a theorem of Elfving 
[ 1 ] which may be stated thus: 

"Let * be a symmetrical unlimited variate with probability F(x). Let ( be 
defined by 

(3) $ =■ 2n VF&)Ii - F(x n )]. 

Then the asymptotic density function p(f) and the asymptotic probability <?(£) 
of £ are: 

(4) g(Z) = (Ko(&; G(& = 1 - 6Ki(6), 

where K<> and Ki are the modified Bessel functions of the second kind and of 
order zero and one.” 



134 


E. J. GUMBEL AND R D. KEENEY 


Introducing instead of A the parameter u defined by F(u) ~ l ~ l/n we 
have, from (1), approximately for large n 

(5) F{x i) = l/n i 1 "" = ^ 0, > 0, k > 0. 

For the variable £ in Elfving’s theorem, we obtain asymptotically 

( 6 ) &/2 = wV*"- 

We attach a subscript k to £ to show its dependence on k. The moments of £ * are 
obtained from a formula given by Watson ([3}, p. 388) as 

(7) £ - 2 l r 2 (l + 1/2) 

and all momenta of this variate exist. 

3. Estimate of parameters. From N sets, each of n observations, we pick out 
the largest and the smallest, X n .„ and X llf . We subtract from each observed 
extreme the central value, m, of the N n observations. If each x n , f = X n .,—m > 0 
and xi,, = < 0 the sample size is large enough. 

Define i? = 1 /p The first two moments of rj are, from (7), 

( 8 ) n = - r s (i + i/ 2 fc), + 1//£) ‘ 

u 

Elimination of the parameter u from these two equations leads to 

ij 2 __ r z (i 4 - 1 /k) 

f T<(1 + 1/2*)' 

In terms of the coefficient of variation, V, this equation becomes 

( 9 ) vT+T* = r(i + iA)/r 8 (i + 1 / 2 *). 

Substitutmg the value of V computed from the observations, we obtain an es¬ 
timate of k, and hence can obtain an estimate of u from ( 8 ). This procedure is 
facilitated by Table 1. 

4. The distribution of the geometric range. From a practical standpoint 
the geometric range itself is preferable to its reciprocal since it is easier to interpret 
and easier to calculate from the observed extremes. We want to establish its 
distribution pi(p). From the relation ( 6 ) of p to £* and the knowledge of the dis¬ 
tribution (4) of £«, we find 

(10) (?i(p) = 1 - G(£ k ) = 2u k p~ k Ki(2u k p~ k ) 
and 




GEOMETRIC RANGE 


135 


Since (.allies of these Bessel function.s are available [2], the various probabilities 
and densities may be evaluated. 

The simplest, way to compare geometric ranges to the theory is the use of a 
probability paper (Figure 1). For its construction, consider the linear relation 

(12) log p - log u + (log 2 )/k - (log £*)/& 

obtained from (t>). Consequently we plot — logf* on the abcissa and write the 
corresponding values (h(p), formula (10), on a horizontal axis. An upper parallel 
to the abscissa shows the return periods. The observed geometric ranges are 
plotted on the ordinate in a logarithmic scale. If the theory holds, the observed 
geometric ranges should be, scattered about the straight line (12). 


TABLE 1 


The nnler k and the varialion V of the reciprocal of the geometric range 


Reciprocal Older i 

Coefficient of 
variation 

Reciprocal Older 

Coefficient of 
variation 

1/A- ! 

V 

1 /fc 

V 

10 j 

088 

.70 

556 

.12 I 

.104 

.80 

632 

.1(5 j 

. 138 

.90 

.709 

.20 

.171 

98 

772 

.30 

.251 

1 00 

788 

.•10 

.332 

2 00 

1.73 

.50 

.404 

4 00 

5.92 

.00 

.480 

G .00 

20.0 


If less accurate estimates of u and k than those obtainable by the systematic 
methods (8) and (0), or the probability paper, will suffice, quick estimates can be 
obtained from the quantiles of the sample of geometric ranges. To the value 
p * u corresponds, according to (0), ft = 2 whence, from the tables [2], (?i(u) = 
2A r i(2) - .27973. From N observed geometric ranges arranged m increasing 
magnitude we thus may pick out the mth, p m , with the rank m = .28 N and 
use it as an estimate u = p« . For the medians \ k and p we get £t = 1.257 from 
the tables, and thus, by (0), p k - 1.591 u\ This formula provides a quick estimate 
of k. We pick out the median p of the N observed geometric ranges. Since we 
have an estimate of u, wc obtain an estimate of k from 


(13) 


1 __ lo g p log u 
k ~~ log 1.591 


4.960 log [p/pm 1. 


6. Analogy between the geometric range and the range. A study of the various 
characteristics of the geometric range for distributions of Cauchy’s type reveals 
structural similarities to the range for distributions of the exponential type. 



OS OV 02 Ol ( t 3 * S 01 02 OS 001 002 


136 


E. J. GUMBEli AND R. D. KEENEY 



.8 SUMSEt. tJO KEENEY 12 















REMARK ON KINCAID’S NOTE 


137 


This is not altogether surprising, since (as shown in Table 2) after the appropri¬ 
ate transformations the probabilities of both are identical functions of the respec¬ 
tive transformed variates. 

Of course the two systems are mutually exclusive: if the observed ranges can 
be reproduced by the first system we conclude that all moments in the initial 
distribution exist. If on the other hand, the observed geometric ranges can be 
represented by the second sytem we conclude that no moments of an order 
greater than k exist. 


TABLE 2 

RANGES AND GEOMETRIC RANGES 


Type q( Initial 
Distribution 

Exponential 

Gnucliy 

Variate 

Definition 

Range 

u> - S„ + (- Zl) 

Geometric Range 

P = Vs„ (- Si) 

Transforma¬ 

tion 

2 «* 2 exp — ~ (s„ — xi — 2u)J 

h — 2 u k p~ k 

Logarithm 

lg z lg 2 - ^ (x„ - xi - 2 u) 

« j 

k 

lg {*. = lg 2 - - (lg x n 



+ lg(- si) - 21gu) 

Probability 

G(w) » 2 Ki (z) 

Gi (p) => £» Ki ({*) 

Distribution 

g{ui) « ~ Ko (z) 

4fc / rA 3t+1 

&«,) 

Median 

to ° 2m + .9286/a 

2lg fi = 21gu + .9280/fc 

Mean 

w m 2u ■+■ 2y/a 

ig^i = -i g u + 2 igr(id -y 2 k) 


REFERENCES 

[1] G. Elfvinq, “Tlio asymptotical distribution of range in samples from a normal popu¬ 

lation,” Biomelrika, Vol. 35 (1947). 

[2] Tables of the Bessel-functions, Vol 6, British Association for the Advancement of Sci¬ 

ence, Cambridge, 1937. 

[3 J G. N. Watson, Theory of Bessel-functions, Cambridge University Press, 1944. 


REMARK ON W. M, KINCAID'S “NOTE ON THE ERROR IN 
INTERPOLATION OF A FUNCTION OF TWO 
INDEPENDENT VARIABLES” 

By T. N. E. Greville 
Federal Security Agency 

In a review of Dr. W. M. Kincaid’s "Note on the Error in Interpolation of a 
Function of Two Independent Variables,” (Annals of Math. Slat., Vol. 19 (1948), 






138 


P. KUDOS 


pp, 85-88) which appeared m Mathematical Reviews, Vol. 9 (1948), p. 470, I 
stated that “a more simple and elegant, and equally general, expression i« ob¬ 
tainable by a simple adaptation of formula (41), p. 215, of J. F. Steffenwn’« 
book, Interpolation.” 

This statement is not entirely correct and is also misleading in its implications 
since Dr. Kincaid’s expressions are actually more general in certain respects, and 
simplicity and generality are not the only considerations nor, in this case, lho 
most important ones. In setting up an expression for the remainder in an inter¬ 
polation formula, the primary objective is to secure an efficient appraisal of the 
remainder. In this respect, Dr. Kincaid’s expressions are superior as they involve 
only the higher derivatives of the function it is desired to represent, whereas 
Steffensen’s method would always involve a first derivative term in such a way 
as to prevent any refinement of estimates of the error by introducing additional 
given values. 


REMARK ON MY PAPER “ON A THEOREM OF HSU AND ROBBINS" 

By P, Erd6s 
Syracuse University 

Professor Robbins kindly pointed out that in my paper mentioned in the title 

(Annals of Math. Stat, Vol. 20 (1949), p. 286-291) I have misquoted a statement 

in the paper of Hsu and Robbins (“Complete Convergence and the Law of 

Large Numbers” Proc. Nat. Acad, of Sa., Vol. 33 (1947), p. 25-31). I attribute 

00 

to Hsu and Robbins the conjecture (notations of my paper) that if ^M n < « 

n — L 

then (1) and (2) hold, and proceed to give a counter example. However, the 
conjecture of Hsu and Robbins is not the above false one but the following: If 

2 M n < ao and (1) holds then (2) also holds. This conjecture is true and is in 

rt—l 

fact proved in my paper. 

Professor Robbins also points out that a slight modification of my theorem 
can be stated in a more concise form as follows: Let Xi,X 3 , • ■ - be a sequence of 
independent random variables having the same distribution function F(x), and let 

Fn = (1 /n) (Xj -)-••• + X») 

Then the necessary and sufficient condition that 

±Pr{\Y n \> e] < eo, 

n—1 


I* OO w 

[ a x dF(x ) = 0, f x 1 dF(x) < « 


is that 


for every t > 0, 



ABSTRACTS 


139 


ABSTRACTS OF PAPERS 

{Abstracts of -papers presented at the New York meeting of the Institute, 

December 27-80, 191,9) 

1. The Asymptotic Distribution of the Extremal Quotient. E. J. Gumbbl, New 
York, and R. D. Keeney, Metropolitan Life Insurance Company, New York. 

The extremal quotient is the ratio of the largest to the absolute value of the smallest 
observation. ItB analytical properties for symmetrical, continuous and unlimited distribu¬ 
tions are obtained from a study of the auto-quotient defined aB the ratio of two non-nega- 
tive variates with identical distributions. The relation of the two statistics is established 
by proving that, for sufficiently large samples from an initial distribution with median 
zero, the largest (or smallest) value may be assumed to be positive (or negative) and that 
the extremes are independent. The logarithm of the extremal quotient has asymptotically 
a symmetrical distribution. Its median is unity. As many moments exist for the extremal 
quotiont as moments and reciprocal moments exist simultaneously for the initial variate. 
For the exponential type of initial distributions, the asymptotic distribution of the ex¬ 
tremal quotient can only be expressed by a complicated integral which may be approxi¬ 
mated in the interval i < q < 2 by the logarithmically transformed normal probability 
function. In tins ease, no moments exist. For the Cauchy type, the asymptotic distribution 
of the extremal quotient is very simple. The logarithm of the extremal quotient has the 
same (logistic) distribution as the midrange for initial distributions of exponential type. 
For both initial types, the asymptotic distributions of the extremal quotients possess one 
parameter which may be estimated from the observations 

2. A Second Formula for Partial Sums of Hypergeometric Series having the 
Unit as Fourth Argument. Hermann von Schelling, Naval Medical Re¬ 
search Laboratory, U. S. Submarine Base, New London, Conn. 

If the arguments a and fl are changed after the summation, published Ann. Math. Slat. 
Vol. 20, (1949) p. 120, and this method iB applied a Becoud time, a new formula results for 
partial sums of F(a,f),y;l). A simple recurrence formula is developed for these partial 
sums. The new equation is a numerical short out as it is demonstrated with an example. 

3. A Coverage Distribution. Herbert Solomon, Office of Naval Research, 
Washington, D. C. 

Consider a fixed target circle of radius Tr and center at a distance B from an aiming 
point. Let N circles each of radius Wr be dropped at the aiming point with their centers 
Bubjcct to a bivariate normal distribution with circular symmetry, the common standard 
deviation denoted by <r. Define y as the set theoretical sum of the N random circles with 
the fixed circle and let c be the ratio of y to the total area of the fixed circle. Then it is 
desired to find Pc, where 

Pc, - Fie £ Co | Tr, Wr, R.N ) 

whore Tr , Wr , and R are in <s units. Define R* = Wr + ala where a = a(c, Wb, Tr); 
|a| < 1. It is shown that for N = 1, the family of curves in the RR* plane 
defined by Pc, = constant have a slope, m, given by 

URR*) 
m “ Zo {RR*) 

where I* is the modified Bessel Function, of k ,h order. In fact as the product 



140 


ABSTRACTS 


RR* approaches infinity, m approaches unity, From these results, the contours of equal 
probability are easily determined. When N > 1, overlap considerations make the compu¬ 
tation of explicit values for P Co intractable. However, in this case, upper and lower 
bounds for P c „ can be obtained. 


4. The Problem of the Greater Mean. It. R. Bahadur and Hkrhdbt Ron bins, 
University of North Carolina, Chapel Hill. 


"Optimum" solutions (in the sense of Wald’s theory of statistical decision functions) 
are obtained for the "problem of tbo greater mean”. Let *, (t •» 1,2) be normal popula¬ 
tions with means m, and common variance a 1 , all unknown, and donoto the arbitrary hut 
given set of possible parameter points u *= (m t , mi : tr) by fi. SuppoBO that a set of n, -p 
•n, independent observations is drawn, n, from w t , and let u ■=■ (in, • • • , Xmi ; xn , • •• , 
Xim) denote the sample point Any measurable function/(u) Buck that 0 < f(v) < 1 is called 
a decision function. Given a "risk function” r(f | u) defined for all / and all we It, a deci¬ 
sion function/*(«) is “optimal” if (i) sup(r(/* |«)] *• inf sup [r(f | «)], and («) no decision 
function is "uniformly better" than f*(v). If f*(v) is the unique (up to sots of measure 0) 
decision function with property (t), it is "optimum". Case 1. Given any decision function 
f(v) and any u t a, let 


Let 


r(f | to) a max Inii, md — m,B { /| w )— m,7i|l — /|«], 


/» 


1 if Si > 1, 
0 otherwise 



It is shown that under certain conditions on fl, j“(v) is optimum. Case 2. Given any decision 
function which takes on only the values 0 and 1, corresponding to the two decisions "no < 
m 2 " and "m 5 < mi" respectively, and any u t SI, let 


r(/| w) = P (incorrect decision I w.R. 

It is shown that under certain conditions onfi, f°(v) is optimal. The conditions on 11 are 
very similar m the two cases, and are likely to be satisfied in most applications. However, 
it is shown by examples that there exist non-degenerate types of ft with rcapoot to which 
decision functions other than /°(s) are uniformly better than /“(t>), The methods 
of the paper can be applied to a number of similar problems. 

5. Some Extensions of Bayes’ Theorem. F. C. Leone, Cose Institute of Tech¬ 
nology, Cleveland 6, Ohio. 


There is some past or a priori knowledge about the quality of a population of lots and a 
sampJe is taken from a random lot. What oan be said about tho lot from which this sample 
is aken? We are incorporating the results of our experiment or sample with tho previous 
knowledge to form a judgment. From the a prion distribution and a sample of n with e 
defectives, say two in twenty-five, we form an a posteriori distribution of all two in twenty- 

“ ution we oan answer c l ue6iionB BUoli as; "What is the o po «. 
erion probability that a lot producing a two in twenty-five result should have a proper- 

Sributions' » P f ° 6 f ° r belgw? ” We con8ider “ o'” a priori situation such 
s nbutions as the rectangular, triangular, normal, Pearson’s Typo III and Type I. 

hundred nT°clt re i aPPll + ed t0 T* industrial data ' In considering lot quality on one 
TI ok i P nt 1 " 8 P eetl0n > 'the a pnon distributions of these data arc mostly 

be a K eood Z < T® bel) - 9 ha P ed aDd J^aped In some cases a Pearson Type I proves to 
be a good fit for the a priori distribution. 



ABSTRACTS 


141 


6. On Optimum Selections from Multinormal Populations. Z W. Birnbaum 
and D. G. Chapman, University of Washington, Seattle 

Let ( X, Y i , ••• , Y „) have an (n + l)-dimensional non-singular normal probability 
density/(X, Yi, , F„). By “selection" in [Yt , • ■ , y„) we shall understand a meas¬ 
urable function <p{Y j , ■ ■ ■ , Y n ) such that 0 < ip < 1 for all Yi , ■ ■ • , Y„ . By a “trunca¬ 
tion in (Y"i, • * ■ , Y n ) to the set n” we understand a selection »i(yi , ■ • • , y„) bucIi that 

<p = 1 for (Yi , ■ , y„) inti, and <p = 0 in 0. A “linear truncation" will be a truncation 

n 

to a set defined by a condition of the form ^ c,F, k Using a slight generalization of 

C=» 1 

Neyman-Pearson’s fundamental lemma, the following theorems are proven among selec¬ 
tions for which the expectation of X, after selection, assumes a fixed value, the one which 
maximizes the “retained” portion of the universe /■•■/ <?(Yi , ••• , y») f(X, Yi , ••• 
Y„)dXdYi • ■ dY n is a linear truncation. Among all the selections for which a given quan¬ 
tile of X, after selection, assumes a fixed value, the one which maximizes the retained 
portion of the universe is a linear truncation. (Research under the sponsorship of the Office 
of Naval Research). 

7. Simple Regression Analysis with Autocorrelated Disturbances. Howard L. 
Jones, Illinois Bell Telephone Company, Chicago. 

When the disturbances in a regression equation are connected by a linear difference 
equation, the parameters of both equations can be estimated simultaneously by maxi¬ 
mizing a function that describes the joint probability of the disturbances or a linear func¬ 
tion thereof This note discusses a simple example. 

8. A Test of Klein’s Model III for Changes of Structure. A. W. Marshall, 
The Rand Corporation, Santa Monica, Calif. 

This paper suggests a test of equations from, linear stochastic equation systems on the 
basis of observations not included in the original computation period. Rejection regions 
of approximately the right size (asymptotically correct) are constructed and the use of 
naive economic models as an auxiliary test are suggested The procedure is applied to 
Klein’s Model III, the results are tabulated and discussed. 

9. An Application of the Theory of Extreme Values to Economic Problems. 
S. B. Littauer, Columbia University, and E. J. Gumbel, New York. 

Most studies of economic time series have been concerned with establishing regularities 
of behavior, often by analogy with mechanical systems Much as regularity in economic 
phenomena is desirable, such evidence as haB been available leaves the reality of 
this sought for regularity considerably in doubt It seems more fruitful rather to ask the 
question, “What is the pattern of the non-regularity” and if reasonably answered, to offer 
somo verifiable form of explanation therefor. It seems further desirable that any attempt 
at “scientific” explanation of economic phenomena be fortified by evidence of statistical 
stability supported by criteria such as were established by Shewhart for the control of 
quality of manufactured product In the present instance certain concepts of experimental 
inference, which seem natural therefor, are employed in order to give some general and 
plausible unity to the behavior of economic time series. 

Following upon the postulates of the theory presented here, the appropriate formal 
development employs concepts of statistical quality control and of the statistical theory 
of extreme values Within this theory the importance of the absence of statistical stability 



142 


ABSTRACTS 


ia emphasized, and the relevance of the use of concepts in extreme values is made evident. 
By introducing a auperunivorsc, peaks and troughs are random expressions of a super 
chance-'‘cause” system The use of these statistical concepts is not motivated by mare 
analogy but rather as the natural means for explanation of the phenomena studied. 

A number of examples of the application of these statistical methods to selected series 
are offered as evidence of the workability of the theory here presented. The extremes of 
the Dow-Jones index of selected industrials show that the 1028 value was completely out¬ 
side the previous levels and should not have been considered as a "stable high plateau 
basic for perpetual prosperity”. Instead this should have suggested the imminent break¬ 
down, The validity of the application of the theory of extreme values to these phenomena 
is not so strongly substantiated as are the many applications that have been made of them, 
to flood frequencies, wind velocities, extreme temperatures, breaking strengths and other 
natural phenomena. Nevertheless the results here obtained are highly suggestive of a 
tenable economic hypothesis. 

10. Bias Due to the Omission of Independent Variables in Ordinary Multiple 
Regression Analysis. (Preliminary Report). T. A. Bancroft, Iowa State 
College, Ames. 


Given n observations of the dependent variate y and the independent variates an, 
xt, ,»,■••, a,, fc < t, all variates measured from their respective sample means, 
and we have calculated the ordinary regression of y on the first fc variates and y on all r 
variates. We define ordinary multiple regression as tho single-equation approach, error 
only in y which is assumed normally and independently distributed with zero mean and 
variance a 1 , the x, being fixod from sample to sample. 

In order to determine whether to omit or retain the last (r - fc) independent variates 
we formulate a rule of procedure' calculate Snedecor’B F ~ 


Reduction in Sy * due to (r fc) variatoa/(r - k) 

Error mean square after fitting all r variates 

If F is non-significant at some assigned significance level «, wo pool the sums of squares 
and degrees of freedom, involved in the numerator and denominator of F, to obtain an 
estimate of the error , and fit y on the first fc variates only, If F is significant at the 
assigned significance level we use tho denominator only in F for our estimate of a-' and 
hence fit y on all r variates. 

The object of this investigation is to determine the bias in our estimate e* of <r* if we 
follow BUch a rule of procedure. The bias turns out to be 


vx 


m + n. 


- + 




where 


So 


fit 


nj + Wi a: ’ 


? (tty 

l-t+1 
2v> ’ 


«1 and n, are the respective degrees of freedom for the numerator and denominator of F t 

Md 13 * function of the population regression coefficients &+, , -. -, p ,, The bias 

is discussed for selected values of the parameters involved. 



ABSTRACTS 


143 


11. Estimating Parameters of Pearson Type III Populations From Truncated 
Samples. A. C. Cohen, Jr., The University of Georgia, Athens. 

The method of moments is employed with 'single 1 truncated random samples (1) to es¬ 
timate the mean, n, and the standard deviation, <r, of a PearBon Type III population 
when as is known and (2) to estimate /i, <r, and ai when only the form of the distribution 
is known in advance. No information is assumed to be available about the number of 
variates in the omitted portion of the sample. The results obtained can be readily ap¬ 
plied to practical problems with the aid of “Salvosa’s Tables of PearBon’a Type III 
Function.” An illustrative example is included in the paper. 

12. The Cyclical Normal Distribution. E. J. Gumbel, New York. 

The usual normal distribution becomes invalid for variateB, like an angle, lying on 
the circumference of a circle The distribution of such variates waB established by 
R. von Mises by the same methods as used for the classical derivation. The cyclical normal 
distribution is symmetrical about a mode and antlmode. The probability function is pro¬ 
portional to an incomplete Bessel function of the first kind and of order zero for an imag¬ 
inary argument, and contains two parameters, the direction of the resultant vector and a 
parameter k linked to the absolute amount of the vector. The parameters may be estimated 
by the method of maximum likelihood. For k = 0, the distribution degenerates into a uni¬ 
form cyclical distribution. If fc is of the order 3, the distribution approaches the linear 
normal one, k being the reciprocal of the variance. With increasing values of k, the dis¬ 
tribution looses its cyclical character and becomes concentrated in a narrow strip This 
distribution holds for symmetrical unimodal values varying according to pure chance 
about a unique mode in a closed space (as the angles of the wind directions) or a closed 
time, and gives a theoretical model for the variations of temperatures, preasures, rain¬ 
falls, storms, discharges, floods, death- and birth rates over the year, and earth quakos 
over the day. The comparison between theory and observations in plotting the square 
roots of the frequency on polar coordinate paper provides a statistical criterion for the 
regularity of cyclical phenomena (Work done in part under contract W 44/109/QM/2202 
with the Research and Development Branch, Office of the Quartermaster General) 

13. Treatment of Attenuation Problems by Random Sampling. H. Kahn and 
T. Harris, The Rand Corporation, Santa Monica, Calif. 

Exact analytical calculations of the transmission of energy by particles through shields 
are difficult; to avoid them random sampling methods may be resorted to. The straight¬ 
forward procedure of simulating life histories of particles, using random number tables, 
mny be used for thin shields, but in the case of thick shields with tremendous attenuations, 
tremendous numbers of particles would be required. In order to obtain reasonably small 
standard errors, using reasonable numbers of simulated life histories, it is necessary to 
modify the original problem to one having a lower attenuation factor, the solution bearing 
a known relation to the solution of the original problem. Alternatively, this may often 
be regarded as an application of well known statistical sampling procedures, such as repre¬ 
sentative sampling or importance sampling, Various special procedures can be devised. 
One of the first was the splitting technique due to J. v. Neumann. Among others may be 
mentioned the exponential transformation, a simple analytic transformation of the origi¬ 
nal problem into one having a much lower attenuation faotor. 

14. On the Existence of Nearly Locally Best Unbiased Estimates. Herman 
Rubin, Stanford University, Stanford, Calif. 

For any family £Fof distributions, and any distribution of T there exists a bilinear 
function K whose arguments are all parameters defined for all distributions of CF and for 



144 


ABSTRACTS 


which there exist unbiased estimates which have finite variance if F,u the true distribu¬ 
tion, and which has the following properties: (1) If ® ia any parameter m the domain of 
K and t is any unbiased estimate of 6, then var(t | F„) > K (6 0). (2) This result ia best 
possible, i. e., for any 0 there is an unbiased estimate t of 0 whose variance differs from 
K(fl, 6) by less than any preassigned amount 

15. The Experimental Evaluation of Multiple Deflimte Integrals. Oeoror W. 
' Taylor, U. S. Army Electronics Laboratory, San Diego, Calif. 


When one is forming an estimate of the total, or moan value, of some quantity, Ham¬ 
blin,. at carefully selected points will frequently be preferable to employing a method 
which involves randomisation. The estimation of the total volume of water in a given 
lake or the amount of energy being released in a given time and space, are examples of 
problems where specified points for sampling should result in a reduction in tho error of 
estimate. These and similar problems lead naturally to numerical integration methods. 
In the case of single integrals, Gauss' and Tehebychef’s formulae yield maximum efficiency 
with respect to controlling the polynomial error and statistical error respectively, but 
often the Newton-Cotes formulae can be applied more conveniently. 

For the evaluation of double integrals, an eight point and a thirteen point formula for 
fifth degree accuracy and a twelve point and a twenty-onn point formula for seventh de¬ 
gree accuracy have been developed for integrating over a rectangle and aimilar formulae 
have been developed for integrating over areas bounded by a parabola and a straight 
line or by two parabolas. The following system of equations is employed ill developing 
these formulae: 


m 

L 

cx-l 


R a x' a \ /„ = C<> , for ad h3 for which i -1- j < 2n, 


and where Cy = 


a'V 

(i + 1 )(j + 1 ) 


for both i and j even, 


= 0 otherwise. 


Formulae for the nuraerioal evaluation of triple integrals taken over a rec¬ 
tangular parallelepiped are developed, including a twenty-one point formula with fifth 
degree accuracy. It is shown that comparable formulae oan be developod for integrating 
functions of more than three variables and a 2n + 1 point formula with third dogroo ac¬ 
curacy for Integrating a function of n variables over a rectangular n-space is obtained. 

16. Tests of Fit of a Cumulative Distribution Function over Partial Range of 
Sample Data. Bradford F. Kimball, New York State Dept, of Public 
Service, New York. 

Case 1 Sample data are completely ordered over range tested. 

Let the n + 1 true frequency differences associated with an ordered random sample of 
n values of x be denoted by u, . The cdf of a theoretical test function based on m of the 
above frequency differences is identified and methods of approximating it are discussed. 

Case 8. Sample data in k ordered groups over range tested. 

Let A, F denote the true frequency differences over tho k sample intervals to bo covered 
by the test. Let mi denote the number of unit frequency differences iq covered by the ith 
interval. Define M and W by 

M + 1 = 2 M g n, 

k 

W = 2A,F, W s 1. 
k 



ABSTRACTS 


145 


A theoretical function Z is defined by 

z ^ (M + l)(flf + 2) ^ 1A,F - w,I V/{M + 1)]» 

k — 1 k m i 

Set 

F = Z/W'. 

The cdf of Y is identified and methods of approximation to it are discussed 

Applications to testing agreement of sample with hypothetical cdf of universe are con¬ 
sidered for both cases in some detail. 

17. Large Sample Tests for Comparing Percentage Points of Two Arbitrary 
Continuous Populations. A. W. Marshall and J. E. Walsh, The Rand 
Corporation Santa Monica, Calif, 

Let us consider two continuous populations, the first with density function fix) and 
100a% point 0„ , the second with density function g[x) and 100/3% point 4>p These two 
populations are arbitrary except that ^ 0, gfof) ^ 0 and both/'(B„), ff'('f’o) exist and 
are continuous in the vicinity of the specified points ThiB paper presents significance 
tests for 6 a — <#>0 which are based on large samples from these populations The exact signifi¬ 
cance level of a test is not known but its value is bounded within reasonably close limits 
(asymptotically), Efficiency properties of these tests (compared to the corresponding 
noncentral t-tests) are investigated for the case in which both populations are normal 
and the ratio of variances is known Results are also derived for simultaneously testing 
6 a and f(6 a )/g{4> e ) These tests have known significance levels (asymptotically), A 
particular application of tests of this type occurs when it is desired to test whether two 
samples came from the same population and agreement of the two populations in a specified 
region is to be emphasized. For this special case, the significance levels of the resulting 
tests are reasonably accurate for moderate as well as large sized samples 

18. On the Distribution of Wald’s Classification Statistic. H. L. Harter, 
Michigan State College, East Lansing. 

A study is made of the distribution of the classification statistic introduced by Wald. 
The exact distribution of V in the univariate case, as obtained by the use of characteristic 
functions and contour integration, is given for both degenerate and non-degenerate cases. 
The problem of classifying an individual into one or the otheT of two populations, using 
the statistic V, is discussed. In the multivariate case, examples are given of the distribu¬ 
tion of an approximation to V suggested by Wald The procedure hero consists integrating 
out two variables from the joint distribution of three variables to find the distribution of 
the third. Four cases arise, depending upon whether the sample size and the number of 
variateB are even or odd. Since thiB approximation ib valid only for large samples, an at¬ 
tempt is made to find an approximation which iB asymptotically equivalent to it aB the 
sample size increases, but which is valid also for Bmall samples. Results are given for a 
sampling experiment performed to determine an empirical distribution of V for a specific 
small sampling case, using a population of 10,000 pieces modeled after Shewhart’s normal 
bowl. Obstacles in the path of practical applications are discussed 

19. Analysis of Extreme Values. W. J. Dixon, University of Oregon, Eugene. 

Consider a population W(m, <r s ) contaminated by introducing a certain proportion of 
values from a population N (m + Aa, a-) or N(ji, XV) The performance of various statistics 
for discovering these contammators is assessed by sampling methods for samples of size 5 
and 15 (This research was sponsored by the Office of Naval Research) 



146 


ABSTRACTS 


20. A Note On. The Variance Of Truncated Normal Distributions. A. ('. Oohkn, 
Jb,, The University of Georgia, Athens. 

Formulas are derived whereby the variance of truncated normal distributions can read¬ 
ily be computed with the aid of an ordinary table of areas and ordinatea of the normal 
frequency function These results are applicable to certain tolerance, problems involved 
in Statistical Quality Control Their use will enable one to make computations required in 
solving such problems without resorting to Karl Pearson’s relatively inaccessible tables 
of "Values of the Incomplete Normal Moment Functions". 


21. Some Estimates and Tests Based on the r Smallest Values in a Sample 
(By Title). J. E. Walsh, The Hand Corporation, Santa Monica, Calif. 

Let ns consider a situation whore only the r smallest values of sample of sue n are avail¬ 
able, This paper investigates the ease where n is large and r is of the form pa + 0{\/n). 
Properties of some well known estimates and tests of the 100p% population point (baaed 
on statistics of the type used for the sign test) are investigated. If the sample i« from a 
normal population, these nonparametric results have high efficiencies for small values of 
p (at least 95% if p < 1/10). The other investigations are restricted to the cose of a nor¬ 
mal population. Asymptotically ‘‘best" estimates and tests of the population percentage 
points are derived for the case where the population variance is known. If the population 
variance is unknown, asymptotically most efficient estimates and trsts can be obtained 
for the smaller population percentage points by suitable choices of p and OtVn). The 
results of the paper have application in the field of life testing. There the r smallest sample 
values can be obtained without the necessity of obtaining the remaining sample, values 
By starting with a larger number of units but stopping tho experiment when only a small 
percentage have “died", it is often possible to obtain the same amount of "information" 
with a substantial saving in cost and time over that required by starting with a smaller 
number of units but continuing until all have "died”. 


22 


Some Comments on the Efficiency of Significance Tests (By Title.) 
Walsh, The Rand Corporation, Santa Monica, Calif. 


J. E. 


A method sometimes used to measure the efficiency of a significance test consists in 
associating a statistic with the test and defining the efficiency of the test to be the effi¬ 
ciency of this statistic considered as an estimate. This paper investigates the power funo* 
tion implications of this method of defining the efficiency of a test. Examples are presented 
which show that an estimate efficiency of 10012% does not necessarily imply that tho corro- 
sponding most powerful test based on 10012% as many sample values has approximately 
the same power function as the given test (for the admissible set of alternative liypothc- 
ses) In several of the examples it was found that estimate efficiency makes no allowance 
for the effect of significance level while the relationship between the power functions of 
to min, "fie! “? h f T esP ° nding most Powerful tost changes noticeably with respect 
c “ S ° rae , 0f th6Se examplcs non-aaymptotic while others 

are asymptotic However, results are obtained for the asymptotic case which indicate that 
his equality o power functions does hold for a rather broad class of signiaeln o l s l 
the pertinent statistics have distributions which are asymptoticaTylrmal 

23 ° ( * 



NEWS AND NOTICES 


147 


Columbia University. Without the application of a sampling procedure the problem can only be 
solved either by a complete physical inventory which is very costly, or by a cycle check which takes 
many years to complete. By use of the sequential sampling method, results of desired accuracy are 
obtained quickly and at very low cost since an extremely small percentage of field inspection for the 
mass property accounts of any large utility produces satisfactory conclusions. 


NEWS AND NOTICES 

i / 

Readers are invited to submit to the Secretary of the Institute news items of interest. 

Personal Items 

Dr. Ralph A. Bradley accepted an appointment as Assistant Professor in the 
Mathematics Department of McGill University, Montreal, Canada after re¬ 
ceiving his Ph.D. in mathematical statistics at the University of North Carolina 
in June, 1949. 

Mr. Fred J. Clark, Jr. received his master of science degree in mathematics 
from the University of Illinois in August, 1949 and is now employed by the Uni¬ 
versity of California at the Sandia Laboratory in Albuquerque, New Mexico. 

Professor J. L. Doob is on leave from the University of Illinois to teach at Cor¬ 
nell University for the academic year 1949-1950. 

Mark W. Eudey obtained his Ph.D. degree in statistics at the University of 
California, Berkeley, and is now Vice President of California Municipal Statis¬ 
tics, Inc. 

Dr. Joseph L. Hodges, Jr. has been promoted to Assistant Professor and Re¬ 
search Associate at the Statistical Laboratory, University of California, Berkeley. 

Professor Paul Horst, formerly of the Department of Psychology, University of 
Washington, is now Director of Research at the Educational Testing Service, 
Princeton, New Jersey. 

Dr. Fred C. Leone, formerly an Instructor and a Research Fellow at Purdue 
University, has been appointed Instructor in the Mathematics Department and 
Director of the Statistical Laboratory at the Case Institute of Technology. 

Mr. Fred W. Lott, who has been studying at the University of Michigan for 
his Ph.D., has accepted an assistant professorship at Iowa State Teachers College, 
Cedar Falls, Iowa. 

Dr. Francis McIntyre has resigned as Director of Export Control, Office of 
International Trade, U. S. Department of Commerce, Washington, D. C. to 
accept a post as Director of Economic Research, California Texas Oil Co., 651 
Fifth Avenue, New York, New York. 

Mr. R. B. Murphy, who has been a graduate student at Princeton University 
has accepted an instructorship in the Mathematics Department of Carnegie In¬ 
stitute of Technology. 

Professor Jerzy Neyman, Director of the Statistical Laboratory, University of 
California at Berkeley, will be on sabbatical leave for the Spring Semester, 1950. 

Mr. Monroe L. Norden, formerly of the Glenn L. Martin Co., is now a Mathe¬ 
matical Statistician with the Operations Research Office, Johns Hopkins Uni¬ 
versity, Ft. Lesley, J. McNair, Washington 25, D. C. 



148 


NEWS AND NOTICES 


Mr. D. Martin Sandelius, formerly a Research Assistant in the Institute of 
Statistics, Uppsala, Sweden, has been appointed Lecturer in the Mathematics 
Department, University of Washington, Seattle, for the academic year 194Q- 
1950. 

After completing his graduate work at Ohio State University, Dr. William J. 
Schull accepted a position with the Atomic Bomb Casualty Commission. He is 
now in Japan as a geneticist working on follow-up studies at Hiroshima. 

Miss Elizabeth L. Scott obtained her Ph.D. degree in statistics at the Univer¬ 
sity of California, Berkeley and was promoted to Lecturer and Research Asso¬ 
ciate at the Statistical Laboratory. 

Miss Ester Seiden obtained her Ph.D. degree at the University of California, 
Berkeley and was promoted to Lecturer and Research Associate at the Statistical 
Laboratory. 

Mr. Iriving H. Siegel is on leave from his position as Chief Economist at the 
Veterans Administration until June 30, 1950, to serve at* Lecturer in Political 
Economy at the Johns Hopkins University and as a member of the Johns Hop¬ 
kins University Operations Research Office staff. 

Dr. Charles M. Stein, Assistant Professor and Research Associate at the 
Statistical Laboratory, University of California, Berkeley, will be on leave for t lie 
academic year 1949-1950 and will ho working in Paris as a National Research 
Fellow, 


Alfred James Lotka 

Alfred James Lotka, a Fellow of the Institute, died in Red Bank, New Jersey, 
on December 5,1949 He was bom of American parents in Poland, March 2, 1HH0, 
and had his early schooling in France. Ills academic training was received at 
Birmingham, England (B.Sc., 1901, and D.Sc., 1912), Cornell (M.A., 1909), and 
Johns Hopkins (1922-1924). Dr. Lotka came to the Statistical Bureau of the 
Metropolitan Life Insurance Company in 1924 and retired as Assistant Statisti¬ 
cian in 1947. His major contributions were his highly original work on the mathe¬ 
matical theory of evolution, on the mathematical analysis of population, and on 
the theory of self-renewing aggregates. Altogether, Dr, Lotka had almost 100 
papers m these fields in technical and scientific journals, both here and abroad, 
lhe essentials of his work are summarized in hisbooks, “The Elements of Human 
loiogy and Theorie analytique des associations biologiques.” He was. in addi¬ 
tion, a joint author on several books in the field of public health. 

Dr. Lotka was a past president of the American Statistical Association and of 

Vice Present ° f H ° had reccntI y bccn “tive as American 

Vice-President of the International Union for the Study of Population, 

Statistical Summer Session in Berkeley, Calif. 

Following the established pattern, there will be held this year a Statistical 

M ^ the Umversit F of California, Berkeley. The faculty will in¬ 

dude William G Cochran of Johns Hopkins University, Benjamin Epstein of 
Wayne University, Erich L. Lehmann of the University of California, PaM L^ 



NEWS AND NOTICES 


149 


of the Ecole Polytechnique, Paris, France and Gottfried E. Noether of New York 
University. 

Courses will be offered on both the graduate and the undergraduate levels. The 
graduate courses, all given during the First Summer Session, June 19 to July 29, 
are meant primarily for students who either have already obtained their Ph.D. 
degree or are working toward it. No specific prerequisites to graduate courses will 
be required. The graduate program includes (i) a course on design of experiments 
and a seminar on analysis of variance by W G. Cochran, (ii) a course on theory 
of estimation by E. L. Lehmann, and (iii) a course and a seminar on random vari¬ 
ables and random functions by Paul Ldvy. 

Inquiries should be addressed to the Office of the Summer Sessions, 1A Ad¬ 
ministration Building, University of California, Berkeley 4, California. 


At a meeting of its Executive Council, AAPOR has laid plans for its 1950 
meetings to be held jointly with the World Association for Public Opinion Re¬ 
search (WAPOR) at Lake Forest College, near Chicago, June 16 to 20. 

The program which is now being planned will be designed to fit the needs of 
the Association’s membership, which is composed of leaders m both the academic 
and commercial fields. 


The Council of the Institute of Mathematical Statistics requested Professor 
Harold Hotelling to communicate to Professor S. S. Wilks its appreciation of 
his editorship of the Annals duiing the years 1938 to 1949. On the recommenda¬ 
tion of the Council Professor Hotelling’s letter is reproduced below. 

January 6, 1950 

Piofessor Samuel S. Wilks 
Fine Hall 

Princeton, New Jersey 
Dear Professor Wilks: 

In behalf of the Council of the Institute of Mathematical Statistics and by its 
direction, I write to express the appreciation we all feel for the splendid efforts 
which you have expended so freely upon the Annals of Mathematical Statistics, 
and which have been so conspicuously successful m establishing it as a sound 
and reputable journal. The years of your editorship are memorable ones for the 
history of statistics, and your contribution to making them so is of first im¬ 
portance. 

Very sincerely, 

Harold Hotelling 


New Members 

The following persons have been elected io membership in the Institute 
(August 23, 1949 to November 30, 1949) 

Anderson, Oskar, Ph.D, (Kiel) Professor, University of Munich, Komgin-Strasse 69, 
Munich {Munchen), Germany 



150 


NEWS AND NOTICES 


Puente Arroyo, Felix Jorge, CPA, (Univ. Nal. Litoral) Professor titular Mathematics, 
Italia 1550 , Rosario, Republica Argentina. 

Arvanltls, Ernest A., A,B. (Boston Umv.) Student at Columbia University, 43-18 40th 
Street, Sunnyside, L. I., New York 

Bhatt, Narbheshaniter M., Ph.D. (Edinburgh Univ.) Professor of Statistics, Commerce 
College, Behind Raopura Tower, Baroda, India 

Bose, Raj Chandra, D, Litt (Calcutta Univ.) Professor of Mathematical Statistics, Uni¬ 
versity of North Carolina, 110 Noble Street, Chapel Sill, North Carolina. 

Carrelro, Oscar Edlwaldo Porto, Civil Engineer (Univ. of Brasil) Professor da Paouladade 
de Cienciaa Economicas, Avenida Sao Sebastiao 266, Sao Paulo, Brazil. 

Crump, Phelps P., B.S. (Iowa State) Graduate Student and Research Assistant, Bern 8457, 
State College Station, Raleigh, North Carolina 

Davis, Richard L,, B.S. (North Carolina State) Sales Engineer, Box 304, Charlotte, North 
Carolina 

Dlckman, Sidney, A.B. (Brooklyn College) Graduate Student at Columbia University, 
ISIS West 25th Street, Brooklyn 34, New York 

Fitzgerald, Rev. John F., S J., M.S. (Univ. of Detroit) Assistant Professor of Physics and 
Mathematics, College of the Holy Cross, Worcester 3, Massachusetts. 

Godsey, Ellis B., B.S. (Indiana Univ.) Analytical Statistician, Army Chemical Corps, 
1716 Pin Oak Road, Baltimore 4, Maryland 

Ghurye, S. G. M.Sc. (Univ of Bombay) Student and assistant, Department of Mathe¬ 
matical Statistics, c/o The Institute of Statistics, Phillips Hall, Chapel Hill, North 
Carolina 


Gutt, Paul, M.S. (Univ of Chicago) Ordnance Research HI, Mathematician, 6431 S. 
Ellis, Chicago, Illinois 

Hannan, Harry H., S.M. (Univ of Chicago) Chief, Statistical Research and Analysis 
Unit, Personnel Research Section, AGO, Dept, of the Army, 4111 Maryland Arc. 
(Brookmont), Washington 16, D, C. 

Henderson, Charles R., Ph.D. (Iowa State) Associate Professor, Animal Husbandry 
Department, Cornell University, Ithaca, New York. 

Harter, Hannan Leon, Ph.D. (Purdue Univ.) Assistant Professor of Mathematics, Michi¬ 
gan State College, East Lansing, Michigan. 

Hoffman, William Charles, M A (Univ. of Calif, at Los Angeles) Graduate Assistant, 
Department of Mathematics, Cornell University, Ithaca, New York. 

Hydeman, William Robert, M.A. (Syracuse Univ.) Mathematician, U. S. Navy Depart¬ 
ment, S8l0-S9lh Street, N.W., Washington 16, D. C. 

Kellerer, Hans, Ph.D Referent, Bayerisches Statistisohes Landesamt, Munchen 8, Rosen- 
heimerstr 130, Germany, 

Kramer, Kenneth H., MS. (Carnegie Inst, of Tech.) Teaching Assistant at Carnegie 
Institute of Technology, 279 Seneca Street, Turtle Creek, Pennsylvania 
Lleberman, Gerald J., M.A (Columbia Univ.) Engineer and Mathematical Statistician, 
25, D o' 11 EnE ‘ neemg Laborator y. National Bureau of Standards, Washington 

Lindley, Dennis V.,MA. (Cantab) University Demonstrator in Mathematics, Statistical 
Laboratory, St Andrews Hill, Cambridge, England. 

Rasth*G A [ ri nJ- U.C.O.F.S, Bloemfontein, South Africa. 

CbWt ° l Statoti0al Department ’ State Serum Institute, 

ReCa0 df“mento 1 prnLt\- (U t n M J™ 6 ™ 1 *) £ ireotor de Estadistica, Minesterio 

versity Calle Pacultad C le ncias Economics, Central Uni- 

Riggs, S’ L Ph D a’ ^ EaPflZ ’” Chacao - Eetado Miranda, Venezuela. 

nartment of Pr0fessor ° f Mathematics, De¬ 

partment of Mathematics, Kent State University, Kent, Ohio. 



REPOBT OP NEW YORK MEETING 


151 


Saxer, Walter, Ph.D Professor a.d. Eldg. Techn Hochschule, Zurich, Goldbach-Kusnacht, 
Switzerland. 

Scobert, Whitney, M.S. (Univ. of Oregon) Associate Professor of Mathematics, Mathe¬ 
matics Department, Idaho State College, Pocatello, Idaho. 

Serfling, Robert E., Ph.D. (Univ. of Mich.) Senior Scientist, Officer in Charge, Statistical 
Branch, Epidemiology Division, Communicable Disease Center, U. S. Public Health 
Service, Atlanta, Georgia 

Steyn, Hendrik S., Ph.D. (Univ. of Edinburgh) Lecturer in Statistics, University of Pre¬ 
toria, SOS Fourth Private Avenue, Villieria, Pretoria, South Africa 

Zachorlas, William B., A,M, (Univ of Pennsylvania) Instructor in Mathematics, Temple 
University, 16X9 — 67th Avenue, Philadelphia £6, Pennsylvania 

Zelgler, R. K., Ph.D. (Univ. of Iowa) Associate Professor of Mathematics, Mathematics 
Department, Bradley University, Peoria 5, Illinois. 


REPORT OF THE NEW YORK MEETING OF THE INSTITUTE 

The twelfth Annual Meeting of the Institute of Mathematical Statistics was 
held in New York City on December 27-30, 1949. Headquarters were at the 
Biltmore Hotel where most of the sessions were held; one or more of the sessions 
were held at the Hotel Commodore, the McAlpin Hotel, and the Governor Clin¬ 
ton Hotel. The meeting was held in conjunction with the Annual Meeting of the 
American Statistical Association, the American Association for the Advance¬ 
ment of Science, the American Mathematical Society, the Econometric Society, 
the Psychometric Society, the Mathematical Association of America, the Asso¬ 
ciation for Computing Machinery, and the American Psychological Association. 
The following 214 members of the Institute attended: 

F. S, Acton, P. H. Anderson, R L. Anderson, T.W Anderson, H. E. Arnold, K J Arnold 
Max Astraehan, R R. Bahadur, E W. Bailey, T. A. Bancroft, W. D. Baten, E. E. Blanche’, 

C. I. Bliss, R. C. Bose, A, H Bowker,R A. Bradley, Dorothy Brady, A E Brandt, I. D. J. 
Bro8B,T H. Brown, 0. P. Bruno, P, T. Bruyere, R.W Burgess, J.M. Cameron, B H.Camp, 
E W. Cannon, S. D. Canter, Bernard Carol, O. S. Carpenter, Maria Castellani, Jack Chas- 
San, Randolph Church, Edmund Churchill, W. G Cochran, A. C. Cohen, Jr , R. H. Cole, 
E. P Coleman, F. G. Cornell, Jerome Cornfield, C. C. Craig, M T, Crapsey, J F Daly, 

D. A Darling, Besse B. Day, F. R. Del Pnore, W, E Deming, Philip Desind, W. J. Dixon, 
C. W. Dunnett, Solomon Dutka, P. S. Dwyer, Benjamin Epstein, W. D Evans, W T Fed- 
erer, William Feller, J W. Fertig, Leon Festinger, C H. Fischer, J. C Flanagan, M. M. 
Flood, L. R, Frankel, N. M. Franklin, H A. Freeman, Bernard Friedman, Melitta L Gar- 
buny, E. F. Gardner, M. A. Goisler, H, H. Germond, Leon Gilford, Abraham Golub, William 
Gomberg, C. H, Graves, S. W. Greenhouse, J. A Greenwood, Evelyn S. Grossman, H. T. 
Guard, Carl Hammer, E. C Hammond, H. H Harman, T E Harris, BoydHavahbarger, 
H, L. Harter, W. A. Hendricks, L. II. Herbaoh, J. L. Hodges, Jr., Wassily Hoefiding, Helen 
M. Humes, Harold Hotelling, Cuthbert Hurd, H. M. Hughes, W. R. Hydeman, S M. Ikh- 
tiar-ul-MuIk, S. I Isaocson, Marcus Jacobs, W W. Jacobs, J. E. Jackson, Carol M. Jaeger, 
J, B. Jeming, R. J. Jessen, H. L. Jones, Alice S. Kaitz, W. C. Kalinowski, Leo Katz, R D. 
Keeney, B. F. Kimball, Leslie Ilish, Lila F. Knudsen, Paul Koditschek, C. F. Kossack, K 
H. Kramer, R, R ICuebler, Jr., S. M. Kwerel, R. B, Ladd, Marguerite Lehr, F C. Leone, 
Joseph Lev, Howard Levene, G. J, Lieberman, Julius Lieblein, S. B Littauer, SimonLopata, 
Irving Lorge, E. D Lowry, L. H. Madow, W. G. Madow, BenjaminMalzberg, JosephMan- 
delson,E. 8. Marks, Margaret P. Martin, J W.Mauchly, P. J. McCarthy, Margaret Merrell, 



152 


REPORT OF NEW YORK MEETING 


Albert Mindlin, P D. Minton, Robert Mirsky, A. M. Mood, Doris N. Morns, R H, Morris, 
Dorothy J.Morrow, J, W. Morse, J E Morton, Judith Mobs, It. G Moss, Frederick Mostel- 
ler C M Mottley, Hugo Muench, L. F Nanni, Dorrs Newman, 0.33. Noether, M. L, Wor¬ 
den j, A. Norton, Jr,, II. W Norton, E. G Olds, P S, OlrriBtead, A. L. O'Toole, W, It. 
Pabst, Jr., R. E. Patton, Katherine Pease, G. W. Petne, B. E. Phillips, E IV. Pike, Aditya 
Prakash, Frank Proschan, J. E. Raup, L. J. Iteed, J. S. Rhodes, P. R. Rider, H. G. Romig, 
Norman Rudy, Marion M, Sandoinire, F. E. Satterthwnite, Mary Aim bavas, M. A. Bchnei- 
derman, Samuel Sohweid, 0 A, Shaw, G. D, Shellord W. A. Shewhart, B. B. Shrikhantle, 
Harry Shulman, I H, Siegel, Roscdith Sitgreaves, G. W. Snedccor, Herbert Solomon, IX K 
South, Mortimer Spiegelman, R. G D. Steel, J R. Steen, Arthur Stem, Joseph Steinberg, 
F. F. Stephan, A. I. Sternholl, J. S. Stock, J. G. Strioby, J. V. Sturtevant, W, It. Thompson, 
L. J. Tick, Gerhard Tintner, M M. Torrey, J W, Tukey, G. W. Tyler, S A. Tyler, Uttam 
Chand, D. F Votaw, Jr„ HelenM. Walker, W. A. Wallis, Samuel Weiss, E. L. Welker, D. It 
Whitney, Frank Wilcoxon, R. I. Wilkinson, S S, WilkB, C. P Winsor, M. A. Woodbury, 
Holbrook Working 


The opening session on Tuesday, December 27, 9 A. M., held jointly with the 
American Statistical Association and tho American Mathematical Society, was 
devoted to Operations Research, with Professor J. Steinhan.lt, Operations Bvalu- 
ation Group, Massachusetts Institute of Technology presiding. The following 
papers were presented: 


1, Topics on Ike Methodology of Operations Research. B. 0 Koopman, Columbia Univer¬ 
sity. 

2. Some Applications of the Mathematical Theory of Games. G, E. Kimball, Columbia Uni¬ 
versity, 

3 Theory of Games. L. Gillman, Operations Evaluation Group, MasmichiiHCtla Institute 
of Technology 

4 Development of Theories of Action. Ellis Johnson, Operations Research Office. Thu J nluia 
Hopkins University. 

6 Some Industrial Applications of Operations Research. A. A. Brown, Operations Evalua¬ 
tion Group, Massachusetts Institute of Technology. 


At the second session, held jointly with the American Statistical Association, 
at 2:30 P M. on the opening day, Professor M. Loeve, University of California, 
gave a special invited address entitled, Fundamental Limit Theorems in Prob¬ 
ability. The discussion was presented by Professor Will Feller of Cornell Uni¬ 
versity and Professor II. E. Robbins of the University of North Carolina. 
Professor Abraham Wald of Columbia University served as chairman. 

The first contributed papers session was held on the same day at 4:00 P, M., 
with Professor W, D. Baten of Michigan State College and Michigan Agricul¬ 
tural Experiment Station as chairman. The following papers were presented: 

1. The Asymptotic Distribution of the Extremal Quotient. E. J. Gurabcl, New York, and R. 
D Keeney, Metropolitan Life Insurance Company, New York. 

2 A Second Formula for Partial Sums of Hypcr-geometric Series Having the Unit os Fourth 
Moment Hermann von Schelhng, Naval Medical Research Laboratory, New London, 
Connecticut. 

3 A Coverage Distribution. Heibert Solomon, Office of Naval Research, Washington, 
DC 

4 The Problem of the Greater Mean. R, R. Bahadur and Herbert Robbins, University of 
North Carolina 



REPORT OP NEW YORK MEETING 


153 


5. Some Extensions of Bayes' Theorem. F. C. Leone, Case Institute of Technology 

6. On Optimum Selections from Multinormal Populations. Z. W. Birnbaura and D, G. Chap¬ 
man, University of Washington. 

On Wednesday morning, December 28, at 10:00 A. M, a session on Cyber¬ 
netics was held jointly with the American Statistical Association and the 
American Mathematical Society The following papers were given: 

1. Technique of Multiple Prediction. Norbert Wiener, Massachusetts Institute of Tech¬ 
nology 

2. Stochastic Problems in Neurophysiology Walter Pitts, Massachusetts Institute of 
Technology. 

3. Information Theory. Claude Shannon, Bell Telephone Laboratories 

with discussion by Professor J. L. Doob, University of Illinois, Professor Mark 
Kac, Cornell University, and Professor L. J. Savage, University of Chicago. 
Professor Jerzy Neyman, University of California was Chairman of the session. 

The session on Review of Statistical Methodology was held jointly with the 
American Statistical Association at 2:00 P. M., Wednesday, December 28, with 
Professor W. A. Wallis, University of Chicago, as chairman. The two papers 
presented were: Review of Statistical Methodology in Agriculture and Related 
Fields, by Professor W. T. Federer, Cornell University and Recent Developments 
in Statistical Methodology in Social Science, by Professor Frederick Mosteller, 
Harvard University; discussion followed by Professor L. J. Savage of the Uni¬ 
versity of Chicago. 

The second session of contributed papers was held jointly with the American 
Statistical Association and the Econometric Society on Thursday, December 
29, at 10:00 A. M., with Professor H. T. Davis of Northwestern University 
presiding. The following papers were presented: 

1. Simple Regression Analysis with Autocorrelated Disturbances. Howard Jones, Illinois 
Bell Telephone Company. 

2. Application of Sequential Sampling Method to Check the Accuracy of a Perpetual Inven¬ 
tory Record Joseph Jeming, New York City. 

3 A Test of Klein’s Model III for Changes of Structure Andrew Marshall, Rand Corpora¬ 
tion. 

4. Application of the Theory of Extreme Values to Economic Problems S. B. Lit- 
tauer, Columbia University and E J. Gumbel, New York City. 

6. Bias Due to the Omission of Independent Variables m Ordinary Multiple Regression 
Analysis. T. A. Bancroft, Iowa State College 

6. Estimating Parameters of Pearson Type III Populations from Truncated Samples A C. 
Cohen, Jr , University of Georgia. 

7. The Circular Normal Distribution. E. J. Gumbel, New York City. 

The third session of contributed papers was held at 2:00 P. M. on Thursday, 
December 29, with Professor L. C. Aroian of Hunter College as Chairman. The 
following papers were presented in person or by title as indicated: 

1 Treatment of Attenuation Problems by Random Sampling. II, Kahn and T. Harris, The 
Rand Corporation 

2 On the Exstence of Nearly Locally Best Unbiased Estimates. Herman Rubin, Stanford 
University. 



154 


KEPORT OF NEW YORK MEETING 


3. The Experimental Evaluation of Multiple Definite Integrals. George Tyler, Xaviil Elec¬ 
tronics Laboratory, San Diego, California. 

4. Teste of Fit of a Cumulative Distribution Function Over Partial Range of Sample Data. 
Bradford Kimball, New York State Department of Public Service, Now York City. 

6. Large Sample Tests for Comparing Percentage Points of Tuio Arbitrary Continuous 
Populations. A W. Marshall and John Walsh, The Rand Corporation. 

0. On the Distribution of Wald’s Classification Statistics. Harman L. Harter, Mi chi gun 
State College. 

7. Analysis of Extreme Values. W. J. Dixon, University of Oregon. 

8. A Note on the Variance of Truncated Normal Distributions. (By title) A. C. Cohen, Jr,, 
University of Georgia, 

9. Some Comments on the Efficiency of Significance Tests . (By title) John Walsh, The Rand 
Corporation. 

10, Some Estimates and Tests Based on the Smallest Values in a Sample. (By title) John 
Walsh, The Rand Corporation. 


The subject of the next session, 4:00 P. M. Thursday, December 29, was the 
Review of Stochastic Processes from the Point of View of Mathematical Statistics. 
This session was held jointly with the American Statistical Association, Pro¬ 
fessor C. C. Craig of the University of Michigan presiding. Two papers were 
given, one by Professor A. B. Mann of the National Bureau of Standards, Ohio 
State University and the University of California; and the second by Professor 
John Tukey, Princeton University. 

On Friday, December 30, at 9:00 A. M, a session on Statistical Methods in 
Astronomy was held jointly with the American Statistical Association and Section 
D of the American Association for the Advancement of Socience. Professor 
Walter Bartky of the University of Chicago, Chairman of the session, opened the 
meeting with introductory remarks on Astronomical Problems Requiring Sta¬ 
tistical Methods. The following papers were presented: 

1. The Nearby Stars. Peter Van Do Kamp, Swarthmore College. 

2 Corrections to Observed Frequency Distributions. Bart J. Bok and J. K. Do Jonge, Har¬ 
vard University. 

3. The Problem, of Selective Idenlifiabilily of Binaries. Elizabeth Scott, University of Cali¬ 

fornia. 

4. Multivariate Penodogram Analysis and Detection of Variable Stars. Harold Hotelling 

University of North Carolina. 


These papers were discussed by Professor Jerzy Neyman, University of 
California. 

The session on Discriminant Functions in Education was held jointly with the 
American Statistical Association, the American Psychological Association and 
the Psychometric Society. Professor T. W. Anderson of Columbia University 
gave an invited address on Classification by Multivariate Measures . followed by 
iscusaion by Professors J. C. Flanagan of the University of Pittsburgh and 

Jolm Carroll of Harvard University. Professor Robert Thorndike of Columbia 
University presided. 

iohittvS, TT 0f the o mee - 1Dg Was devoted t0 Computation and was held 
jointly with the American Statistical Association and the Association for Com- 

putrng Machinery Professor Harold Hotelling of the University of North Caro¬ 
lina serving as Chairman. The following papers were given: 



MINUTES OF ANNUAL MEMBERSHIP MEETING 


155 


1. Idiosyncrasies o/ Automatically-sequenced Digital Computing Machines, Ida Rhodes, Na¬ 
tional Bureau of Standards. 

2. Problem Solving on Large-Scale Automatic Calculating Machines. W. D. Woo, Harvard 
University. 

3. A Statistical Application of the UNIVAC. John Mauchly, Eckert-Mauchly Computer 
Corporation. 

These papers were discussed by James McPherson, Bureau of the Census and 
Emil Schell, Office of the Air Comptroller. 

Meetings of the Council were held on Tuesday, December 27, at 12:00 Noon, 
Professor Jerzy Neyman presiding and again on Thursday, December 29, at 
12:00 Noon, Professor J. L. Doob presiding. The Business Meeting was held on 
Wednesday, December 28, Professor Jerzy Neyman presiding. The report of 
this meeting is given elsewhere in this issue. 

S. B. Littauer, 
Associate Secretary 


MINUTES OF THE ANNUAL MEMBERSHIP MEETING, NEW YORK, 

DECEMBER 28, 1949 


The meeting was called to order at 4:30 P.M. by President Jerzy Neyman. 
The annual reports of the President, Editor, and Secretary-Treasurer were read. 
They are printed elsewhere in this issue. 

It was moved by Harold Hotelling that the front cover of the Annals in the 
future shall bear the additional notation that it was edited during the years 
1938-1949 by S. S. Wilks. Motion was seconded and carried unanimously. 

The tellers reported the election of the following officers: 

President-Elect P. S. Dwyer 

Members of the Council for 1950-1952 David Blackwell 

W. G. Madow 
Frederick Mosteller 
L. J. Savage 

Meeting was adjourned at 5:15 P.M. 

Carl H. Fischeb 
Secretary 


REPORT OF THE PRESIDENT OF THE INSTITUTE FOR 1949. 

I wish to begin my Report by welcoming the newly elected Fellows, Doctors 
Z. W. Bimbaum, D. J. Finney, H. O. Hartley, Wassily Hoeffding, Michel 
Lo&ve, Edward Paulson and S. N. Roy. In addition, a hearty welcome is due 
to Dr. G. W. Brown who was elected last year, but inadvertently omitted in'the 
published list. The election to the fellowship is a mark of recognition on the part 
of the Institute. At the same time, I am sure the Institute has reason to be proud 
of having among its fellows such distinguished scholars as are now added to the 
list. 



156 


REPORT OF THE PRESIDENT 


During the past year the intensity of the Institute's life grew markedly in 
many respects. In particular, a very considerable number of our members took 
part in various Committees. For the sake of brevity, the composition of all the 
Committees is given in a tabular form at the end of the Report. At this time I 
wish to express the indebtedness of the Institute to the Chairmen anil to the 
Members of all the Committees. 

Undoubtedly the most important function of the members of the Institute is 
research and the most important function of the Institute itself is the publica¬ 
tion of the results of this research. In this respect the past year brought about a 
fundamental change: after a dozen years of hard and most fruitful work as 
Editor of the Annals , Professor S. S. Wilks resigned this year and the Council 
elected Professor T. W. Anderson as his successor. According to our present 
Constitution, the term of office of the Editor is three years. 

About a decade ago I suggested and the Membership Meeting of the Institute 
approved that the cover of the Annals bear the name of its founder, Professor 
Harry Carver. Founded by Carver, the Annals were developed by Wilks and, 
now stand as the most important statistical journal in the world. Accordingly 
the Chair will welcome a motion to add Professor Wilks’ name as a permanent 
feature of the cover of the Annals of Mathematical 'Stalislics. 

While being grateful to Wilks and regretting his withdrawal, we should ex¬ 
tend a most hearty welcome to T. W. Anderson, Because of his scholarship, 
broad vision combined with broadmindedness and because of hia energy, he is 
an excellent promise for the future of the Annals. It is a pleasure to express the 
gratitude of the Institute to Columbia University and, in particular, to Dr. 
Abraham Wald for providing the necessary facilities for the Editorial office of 
the Annate. 

Prio- to embarking on the election of the new Editor, the I.M.S. Council 
approved an important document prepared by a special Committee chaired by 
S. S. Wilks, formulating the editorial policy of the Institute. 

Of the many fundamental parts of this document I wish to mention the fol¬ 
lowing: 


(i) In establishing the editorial procedure, special care should be taken to 
avoid the danger of the Annals becoming a one-group journal rather than 
serving the Institute as a whole... the refusal to publish a paper on 
grounds of general policy (rather than because of some verifiable defects 
such as mistakes, triviality, lack of new material, etc.) shall bo based on 
a unanimous agreement of the Editor and of all the Associate Editors." 

• fv se f . ldee ; behind ^ese passages is, of course, that thus far, the Annate 
1S u °?I I0Uma ! P ubUahed b y the Institute and should provide facilities for 
a 1 the different schods of thought. My understanding is that this includes the 
biostatisticmn Cochran and the econo-statistician Koopmans, the multivariate 
f ' , aQd t! j e tolerant Wilks, the quality-control-minded Shewhart and 
ependently-limiting Loeve, the necessary- and sufficient-normal Feller and 

the er t? NeUm T’ the reIativis tically-cybernetic Wiener and 

the general-sequential-decision-maker Wald. I should think that even our next 



REPORT OF THE PRESIDENT 


157 


President, the stochastically-processed-Markovian Doob, is meant to have a 
chance to publish in the Annals, from time to time. 

(ii) Another interesting point in the same document concerns the proposed 

approximate distribution of space in the Annals; 

(a) research papers on mathematical statistics proper—60 per cent; 

(b) research papers m borderline fields, including applications—20 per 
cent; 

(c) expository papers—15 per cent 

(d) news, notices, etc.—5 per cent. 

Since in the past there was too little expository material, the Council insti¬ 
tuted the so-called Special Inivited Papers, to be presented from time to time on 
selected subjects. The text of these papers, accompanied by the prepared dis¬ 
cussion, will be printed in the Annals. The program of the present meeting in¬ 
cludes our first Special Invited Paper, by Michel Loeve. It is hoped that the 
Special Invited Papers will satisfy the need for expository material now felt by 
the membership of the Institute. I am sure the Program Committees will appre¬ 
ciate suggestions of the Members regarding the sections of the theory requiring 
expository presentation. 

The financial aspect of the publication program of the Institute was a con¬ 
tinued worry of the Council. As is well known, the Annals is overloaded with 
papers and the cost of printing is growing constantly. In order to ease the situ¬ 
ation somewhat, our new Constitution was amended to include the provision 
that the Universities and other institutions could become Institutional Members. 
There is already some additional income from this source and, if all the members 
of the Institute are energetic in urging their Departments to become Institu¬ 
tional Members, this income may be quite substantial. 

It is conceivable that some potential sources of funds exist, not directly avail¬ 
able for the Annals, which may be used for starting a new statistical journal. 
In order to investigate this possibility a special committee was appointed under 
the chairmanship of Professoi ScheffA This Committee did an excellent job in 
trying to find a solution of the tremendously difficult problem and there is now 
a reasonable hope that, in the not very distant future, our publication facilities 
will be increased. 

Another deep change in the structure of the Institute occurred this year. 
Here I have in mind the resignation of Dr. Paul S. Dwyer, our long and hard 
working Secretary, and the taking over by Dr. Carl Fischer. Dr. Dwyer’s resig¬ 
nation was announced last year at the meeting at Cleveland and we expressed 
to him our hearty thanks for his untiring work for the Institute. I wish to repeat 
these thanks now and to accompany them by the hearty congratulations on the 
excellent program he prepared for this meeting in his new capacity as the Chair¬ 
man of the National Program Committee 

Until recently, there was a certain disequilibrium in the location of the meet¬ 
ings of the Institute. Practically all of the meetings were held m the East and 
the West Coast membeis could attend them only as a matter of exceptional luck. 
Later, regional meetings were organized, and this year we have functioning three 



158 


REPORT OF THE PRESIDENT 


Regional Program Committees, one for the East, one for the West Coast anti 
one for the Middle West. In addition, we have Program Committees for the two 
National Meetings of the Institute. In parallel with the redistribution of meet¬ 
ings, there was an increase in their number. This process was accompanied by 
the very efficient help on the part of the governmental organizations, of the 
Office of Naval Research, the Air Force, and the Army, for the members of the 
Institute to attend the meetings even if they are held at a considerable distance. 
As a combined result of these developments it now may seem that there are too 
many meetings. Undoubtedly, the number and the location of future meetings 
of the Institute will be seriously discussed and adjusted to the existing needs. 

Naturally, the help of the Governmental institutions was not limited to help 
in travel. A considerable number of research projects in statistics are now in 
progress in many institutions with excellent results for science, for the younger 
people who are given the chance to make their first independent research work 
without undue worry about food and shelter and, thus, for the country os a whole. 
The first organization to support fundamental research in general, and in sta¬ 
tistics, in particular, seems to be the office of Naval Research. Its broadminded¬ 
ness and understanding of the spirit of research have established a very high 
standard which is also sustained by other institutions. If permitted to function 
as they do now, these institutions will mark an epoch in the development of 
scholarly work in this country. 

The following persons have accepted the appointment to the Nominating 
Committee for the next year 

Henry Scheff <$—Chairman 
Albert W. Bowker 
Paul G. Hoel 
Leonid Hurwicz 
Herbert E. Robbins 
David F. Votaw, Jr. 


Composition of the Committees of the Institute in 1949 


1. Program Committees (P.C.) 

(i) Eastern P.C. for the April 1949 
meeting in New York 
Churchill Eisenhart, Chairman 
W. G. Cochran 
C. F. Kossack 
S. B. Littauer 
F. Mosteller 

(iii) National P.C. for the Summer 
Meeting at Boulder, Colorado 
W Feller, Chairman 


(ii) West Coast 1\ G. for Juno meeting 
in Berkeley 

M.A Girahiclc, Chairman 
Z. W. Birnbaum 
W. J. Dixon 
J. L, Hodges, Jr. 

P. G. Hoel 
A. M. Mood 
(iv) Mid West P.C. 

C. C. Craig, Chairman 



REPORT OF THE PRESIDENT 159 

J. L, Doob 

L. Hurwiez 

M. A. Girshick 

W. G. Madow 

C. C. Hurd 

K. May 

J. Wolfowitz 

L. J. Savage 


D. R. Whitney 

(v) National P.C. for the Decem¬ 

(vi) Eastern P.C. for the Spring 1950 

ber meeting in New York 

meeting in North Carolina. 

P. S. Dwyer, Chairman 

H. Hotelling, Chairman 

J. Berkson 

D. Blackwell 

G. W. Brown 

H. Geiringer 

C. Eisenhart 

S. B. Littauer 

Mark Kac 

D. F. Yotaw, Jr. 

H. Rubin 

S.S. Wilks 

2. Committee for Special Invited Papers 


J. W. Tukey, Program Coordinator, Chairman ex officio 

C. C. Craig 

W. Feller 

P. S. Dwyer 

M. A. Girshick 

C. Eisenhart 

H. Hotelling 

3. Committee on Editorial Policy ( 1948-1949 ) 

S. S. Wilks, Chairman 


W. G. Cochran 


W. Feller 


M. A. Girshick 


P. S. Olmstend 


J. Neyman 


W. A. Wallis 


J. Wolfowitz 


4. Committee to Nominate Candidates for 

the Editor of the Annals 

Harry C. Carver, Chairman 

Howard Levene 

David Blackwell 

Frederick Mosteller 

S. Lee Crump 

Herbert E. Robbins 

Erich L. Lehmann 


5. Committee on Tabulation 


C. Eisenhart, Chairman 

C. C. Hurd 

C. I. Bliss 

A. N. Lowan 

F. W. Dresch 

W. G. Madow 

H. Ii. Germond 

H. G. Romig 

H. 0. Hartley 

L. E. Simon 

6. Committee on Directory 


John W, Tukey, Chairman 


Churchill Eisenhart 


7. Committee to Revive the Statistical Research Memoirs 

Henry Scheff6, Chairman 

C. C Hurd 

T, W. Anderson 

George Kuznets 

Walter Bartky 




160 


REPORT OF THE PRESIDENT 


8, Rietz Lectures Committee 

The Chairmanship of this Committee was accepted by Abraham Wald, 
the first Rietz Lecturer, who undertook to make further appointments, These 


are: 

C. C. Craig 
W. Feller 

9. Committee to Encourage Membership outside of the United States 
T. W. Anderson, Chairman 

C. C. Hurd 
M. Lofeve 
J Marschak 

10. Committee on Statisticians m the Government Service 
W. E. Deming, Chairman 

C. Eisenhart 

11. Representative of the I.M.S. to the American Association for the Advancement 
of Science 

Harold Hotelling 

12. Representative of the I.M.S. to the National Research Council, Division of 
Physical Sciences 

Walter Bartky (1948-1950) 

13. Representative of the I.M.S. to the Mathematical Policy Committee 

S. S. Wilks 

14. Representative of the I.M.S. to the Joint Committee for Development of Statistical 
Applications in Engineering and Manufacturing 

Benjamin Epstein 

15. Representatives to the Inter-Society Cooperation on Mathematical Training of 
Social Scientists 

T. W Anderson 
J. L. Doob 
S.S. Wilks 

16. Committee to Determine the Duties and Responsibilities of the Program Com¬ 
mittees 

Harold Hotelling, Chairman 
M. A. Girshick 
S. B. Littauer 


December 31,1949 


J, Nkyman 
President 


REPORT OF THE SECRETARY-TREASURER OF THE INSTITUTE 

FOR 1949 

At the beginning of 1949 the Institute had 1101 members and during the 
period covered by this report 153 new members (8 of whom begin their member¬ 
ship with 1950) joined the Institute and two members were re-instated. During 
1949 the Institute lost 87 members of whinh 97 1 — 



REPORT OF THE SECRETARY-TREASURER 161 

suspension for non-payment of dues. Judging from the information available at 
this date, the Institute will have 1167 members as it starts 1950. 

During 1949 the Constitution was amended to provide for a new class of mem¬ 
bership: Institutional Membership. Although the campaign for institutional 
members started late in the year, by December 31 there were five universities on 
the rolls: California, Purdue, Illinois, Princeton and North Carolina. It is hoped 
that many more universities and corporations will enroll during 1950. 

Meetings of the Institute held during 1949 included those at Columbia Uni¬ 
versity on April 8-9, at the Berkeley campus of the University of California on 
June 16-18, at the University of Colorado on August 29-September 1, and at 
New York City on December 27-30. The Secretary wishes to call attention to 
the excellent work of the members who served as Assistant and Associate Secre¬ 
taries at these meetings: Professor S. B. Littauer at New York, Professor J. L. 
Hodges, Jr., at California, Professor H. T. Guard at Colorado and Associate 
Secretary Professor Littauer who was responsible for the New York Meeting. 

The following Fellows served as members of the Committee on Fellows: C. 
C. Craig, chairman, T. W. Anderson, M. A. Girshick, Harold Hotelling, Henry 
Scheff6, and F. F. Stephan. 

The meeting scheduled for November 25-26 at the University of California 
at Los Angeles was cancelled by vote of the West Coast membership because of 
the proximity of the Boulder and Christmas Meetings. 

At the Council meeting at Boulder, August 29, 1949, the following Associate 
Secretaries were elected: 

Associate Secretary Section 

S. B. Littauer Eastern 

K J. Arnold Central 

J. L. Hodges, Jr. Western 

By a mail vote of the Council, conducted during October, 1949, T. W. Ander¬ 
son was elected Editor for the period 1950-1952. 

A summary of the financial status of the Institute is given below: 


FINANCIAL STATEMENT 
December 20, 1948 to December 31, 1949 

A. I1ECEIPTS 

Balance on Hand,* December 20,1948 , ... . ... $7,121.01 

Dues. . . ... 7,826.36 

Contributions. . 166 16 

Life Memberships .... . , . 392.60 

Institutional Memberships . . 400.00 

Subscriptions , , . . 4,779.07 

Sale of Back Issues .... . 3,314.41 

Biometrika, ... . . . 793.60 

Income from Investments . • 100.00 

Miscellaneous. . . 169.70 

U 

Total . . .... $25,062 69 


* In bank deposits and government bonds. 





162 


UEPOHT OP THE secretary-treasurer 


8. EXPENDITURES 

Annals—Current 

Office of the Editor , . ... . $ 275.00 

WaverlyPiess . 3,777 05 $ 0,052.85 


Annals—Back Numbers 

Reprinted Vol. II # 4; 111*4, IV#3 & #4; VtKl, VI# 1,2, 3 &4; 


XIII #1,2, &4. 

Mathematical Reviews and Inter-Society Committee ... 

Office of the Secretary-Treasurer 

Printing,memoranda, etc. (Including some stamped envelopes).. ,150,01 

Postage, supplies, express, telephone calls. .... 276 00 

Clerical help . 2,208.40 

Travelling expense 223,61 


$ 2,910.55 

200.92 


$ 3,808.02 


Miscellaneous. $ 57 

Biometrika. , ,. . . .$ 057.30 

Balance on Hand,‘December 31,1949. . $7,982.08 


Total. 


$25,052 09 


O. SUMMARY OP RECEIPTS AND EXPENDITURES 


Balance on Hand, ‘December 20,1948 . 

Receipts during 1949 . 

Expenditures during 1949 ... . 

Balance on Hand, ‘December 31,1940 . 

D. LIFE MEMBERSHIP FUNDS 


$ 7,121.01 
17,931.68 
17,070.61 
$ 7,982.08 


It has been the practice to set up an amount equal to all life membership payments «a a 
liability and to hold all these funda in reserve until the death of the member -after which 
his payment is released to the general fund, There were three new life meraborehip payments 


Number of Life Members . 
Total Reserve Held. . 


Pwmftcrlfl, 

WS 


. .. 29 

.... $2,280.00 


Dcwmtvi- SI 
ism 


32 

$2,672.50 


H. BACK ISSUES FUND 

It has been our policy, since January 1,1948, to use income from the 
finance the additional reprinting of back isbubb. 

Previous balance in back issues fund. 

Income from the aale of back issues during 1949.. 

Expense for reprinting back issues in 1949 .. ’ 

Balance, December 31,1949 . 


sale of back issues to 

. $ 749.77 

. 3,314 41 

. 2,910.65 

. $1,153.03 


ASSETS 


F. BALANCE SHEET, DECEMBER 31, 1949 


Cash. 

If S Government G Bonds 
TJ. S. Government P Bonds (Purchase price). 
Current Accounts Receivable 
Estimated Value (Cost of Back Annals**) 


kank deposits and government bonds 
Cost of Annals calculated at 67 cents per copy 


December 81, J9j0 

{nerttue airirr 
flftwmW SO, ISIS 

• .. $ 3,094.08 

$ 861.07 

■ .. 3,000.00 


1 , 888.00 

, — 

• .. 545.78 

254.66 

. 16,459.22 

3,673 61 

$24,987.08 

$4,789,24 














REPORT OP EDITOR 163 

LIABILITIES 

Reserve for Life Memberships .... , , . $ 2,672,50 $ 392 50 

Reserve for Reprinting Back Issues. 1,153.63 403.36 

Suiplus . . 21,160.95 3,992.88 


$24,987.08 $4,789 24 


Q. SUMMARY 

The surplus of the Institute has increased during the year of 1949 by $3,992.88. While this 
indicates a favorable condition, it should be noted that roughly 92% of this gam ib repre¬ 
sented by an increase in the inventory of back issues of the Annals, This asset is definitely 
of the non-liquid sort and thus the major portion of our gain is of little assistance in meet¬ 
ing our current need for more publication space in the Annals, 

It should be noted that the year-end statements have always included a substantial 
amount in prepaid dues and subscriptions on the asset side without a corresponding lia¬ 
bility. The figure for December 20, 1948 is $4,060.50 and for December 31, 1949 is $4,682 37. 
Thus it will be seen that we are virtually running on a hand-to-mouth basis. It is hoped that 
an increase in the number of individual and institutional memberships during 1960 will 
bring us into a more favorable situation 

Beginning with January 1, 1950 we plan to revise the bookkeeping system which is no 
longer adequate for an organization of our present size. In the future, these reports will be 
made on an accural basis rather than a cash basis and thus will present the data pertaining 
to each year on a more realistic basis 

We are now in a position to supply all issues beginning with Volume 1. Five or six of the 
back issues are in short supply, but we expect to be able to reprint these when our supplies 
become exhausted, using receipts from the sale of back issues to pay for the reprinting. 

Carl H. Fischer 

December 31, 1949 Secretary-Treasurer 


REPORT OF THE EDITOR OF THE AHNALS FOR 1949 

The 1949 volume of the Annals exceeded, by a few pages, the 600 pages bud¬ 
geted for it at the beginning of the year, A total of 66 papers were published, as 
well as the usual reports, abstracts, and items of news and notices. The 1949 
volume was Volume 20 of the Annals, and it seemed fitting to publish a cumula¬ 
tive index of papers for the first twenty volumes of the Annals. Such an index, 
containing both author and subject indexes, has been published as a separate 
31-page pamphlet and is being distributed with the December 1949 issue of 
the Annals. 

The rate of submission of manuscripts continues to increase. By the end of 
1949 enough manuscripts to fill two issues of the Annals had been accepted for 
publication. At the same time approximately forty manuscripts were at various 
stages of refereeing and revision. This means that authors submitting manu¬ 
scripts at the beginning of 1950 can hardly expect to see their papers in print in 
less than a year. The rate at which the average gap between submission of manu¬ 
scripts and their appearance in print has, for the last two years, increased about 
two issues (six months) per year. There is no reason to predict that this rate will 
change for at least another year or two. Thus, it is highly desirable that every 
effort be made to expand the publication program of the Institute during 1950. 



164 


REPORT OP EDITOR 


The most immediate possibility would be to expand the Annals by at least 100 
pages if the budget will permit. In the meantime, it is hoped that the Institute 
committee to study the feasibility of reviving the Statistical Research Memoirs 
will be able to work out a piactieal plan for further increasing the publication 
facilities of the Institute. 

The manuscripts being submitted continue to cover a wide range of topics in 
probability and statistics. There is still a scarcity of good review and expository 
articles being submitted, but with the institution of special invited addresses so 
widely discussed at the Cleveland meeting of the Institute in December, 11,48, 
e we can expect to receive more review and expository articles in the future. 

The Editor takes this opportunity to acknowledge, on behalf of the Editorial 
Committee, the refereeing assistance which has been generously given during the 
year by the following persons: A, C. Aitken, E. W. Barankin, Z. W, Birnbaum, 
R. C Bose, A. H. Bowker, G. W. Brown, K. L, Chung, W, J. Dixon, A. 
Dvoretzsky, Hilda Geiringer, L. A. Goodman. T. N. E. Greville, F, I?. Grubbs, 
John Gurland,M.H. Hansen, T. E.Harris, H.O. Hartley, E. I, Kaplan, B. F. 
Kimball, T, Koopmans, Julius Lieblein, H. Levene, M. S, MaoPhail, P. J. 
McCarthy, R. B. Murphy, G. E. Noether, E. G. Olds, P. S. Olmstead, Richard 
Otter, E. Paulson, M. P. Peisakoff, E. J. G Pitman, Milton Sobel, D. F. Votaw, 
Max Woodbury, and J. L. Walsh. 

Thanks are due to Mr. M. E. Freeman, Mr. L, A. Goodman and Mr. E. F. 
Whittlesey for preparation of manuscripts and to Mrs. Lily D. Smith for other 
editorial and office assistance in connection with the Annals, 

Finally, on behalf of the Editorial Board, which has had the responsibility for 
editing the Annals since 1938, the Editor extends every good wish to the new 
Editor, T, W. Anderson, and the new Editorial Board, who will inherit nearly 
a full year of accepted manuscripts but will otherwise assume editorial responsi¬ 
bility for the Annals beginning with the 1950 volume. 


December 21,1949 


S. S. Wilks 
Editor, 



THE IDENTIFICATION OF STRUCTURAL CHARACTERISTICS* 

By T. C. Koopmans and 0. Reiers0d 
Cowles Commission for Research in Economics 

1. Introduction. 

1.1. “ Population” versus " structure .” In a fundamental paper (Fisher, [1]) 
R. A. Fisher distinguished as the first group of problems in mathematical statis¬ 
tics the "specification of the mathematical form of the population from which 
the data are regarded as a sample.’’ It is the purpose of this article to suggest a 
reformulation of the specification problem, appropriate to many applications 
of statistical methods, and to point out the consequent emergence of a new 
group of problems, to be called identification problems. 

In many fields the objective of the investigator’s inquisitiveness is not just 
a "population” m the sense of a distribution of observable variables, but a 
physical structure projected behind this distribution, by which the latter is 
thought to be generated. The word "physical” is used merely to convey that 
the structure concept is based on the investigator’s ideas as to the "explanation” 
or “formation” of the phenomena studied, briefly, on his theory of these phe¬ 
nomena, whether they are classified as physical'in the literal sense, biological, 
psychological, sociological, economic or otherwise. Examples of such structures, 
drawn from the fields of economic fluctuations and of psychological factor 
analysis, are given in sections 3 and 4, More detailed discussions of these exam¬ 
ples can be found in other publications by the present authors and by others 
[15], [19], In this article, we are therefore not concerned with the merits of par¬ 
ticular assumptions entering into the specifications considered. Our examples 
are used only as the basis for a generalizing formulation (Section 2) and a com¬ 
parative discussion (Section 5) of the identification problem, i.e,, the problem 
of drawing inferences from the probability distribution of the observed variables 
to the underlying structure. The belief is here expressed that this is a general 
and fundamental problem arising, in many fields of inquiry, as a concomitant 
of the scientific procedure that postulates the existence of a structure. 

The general formulation of the identification problem in Section 2 is, there¬ 
fore, held abstract. Some readers may prefer to give substance to the various 
concepts by reading Sections 3-4 alongside Section 2. In addition, we insert 
here a simple example showing the main features of the identification problem. 


1 To be included in Cowles Commission Papers, New Series, No 30 The authors reported 
on this study in papers before the Berkeley meeting of the Institute of Mathematical 
Statistics in June 1948. Wo are indebted to Dr. G. Rasch of the University of Copenhagen 
and to Professor L L. Thurstone of the University of Chicago foi many fruitful discussions 
on the subject matter of this article, for which the responsibility lies exclusively with the 
authors. 


165 



160 


T. 0. KQOl'MANS AND 0 HKIKItsOL 


1 . 2 , A simple example of Ike identification problem. Tin* example is concerned 
with the problem of estimating the panmielcr.-, «, t J, of a lineal lelatumship 

(1.1) i)2 = a -(- /lij! 

between two variables 771 and ip both of winch (Lie Ob'-laved only subject to 
errors of observation »i and »<■ Thus, observations are a\ailable only tor the 
variables 


(1.2) y, = i}, + «, where E{n .) 0, / 1,‘J. 

The question under what conditions a consistent estimate of ,i exists hm, 
repeatedly attracted attention To discuss (Jus question, we shall consider a 
model in which 771 is independent of («, , »..) anil in which the joint distribution 
of Ui and u% is normal. 

If also the distribution of tjx is normal, it is easy to sec that d eannol he deter¬ 
mined from a knowledge of the joint probability distribution of the observed 
variables ih and y 2 " In this case the joint distribution of //, and is also normal 
and the distribution is completely characterized by live piuuinotei.s, E{yf), 
Efa), var ( 9 / 1 ), var (j/ 2 ), and eov fa , ?/■■). The parameters ,J and var (?;,) may 
now be chosen in any way such that the second term in the, light hand mem¬ 
ber of 


'var fa) cov fa , y 2 )l _ [l 0 1 f var (»,) eov fu, , » s )~ 

_co vfa,y 2 ) var fa) J fa d°'J 1 M [.eov fa ,«.) var (a-.) 

is a positive definite matiix. It is clear that if the left hand member is non¬ 
singular, this condition can be met for any arbitrary value of ti combined with 
a sufficiently small value of var fa). 

It can be shown that 0 is uniquely determined by the joint probability dis¬ 
tribution of y 2 and y 2 if this distribution is not normal. We shall prove this in 
the case that certain semi-invariants exist . 3 

Let fayfa , U) denote the characteristic function of flic joint distribution 
of y 2 and y 2 

(L3) <t> vw fa , tf) = E (e VlU ^ u " 1 '-), 

and let 


l /w=(h > tv) = log <t>i ny ,.(li, < 2 ), 

Similar notations will bo used for the characteristic functions of oilier random 
vanables, and the logarithms of these functions. 

Since fa , u 2 ) and fa , ■>;.>) arc independent, we obtain 

_ 'f'vivi(h , k) = fvmiU , f 2 ) -j- , tj), 

1 See [13], middle of page 70 

’ The following proof is analogous to that, given by Geary [8] in the case when the k'h 
are not supposed to be normally distributed, but independent 



IDENTIFICATION OF STRUCTURE 


167 


and from equations (1.1) and (1.3) we obtain 

, U) = H(e^‘ 1+la+M ' h ) 

= fi aii2 0Kl(^l + Ph), 


or 

( 1 . 6 ) ~ odU + 'PnAt-i + fik). 

Combining (1.5) and (1 6), wc have 

(1.7) 'PvivAh j fe) = + ^n(b + fiU) + 'Pu 1 u 2 (ti, if), 

where , k) is a polynomial of second degree, since the joint distribution 

Ui and u 2 is normal. Let x rs be the semi-invariants of the distribution of (j/i, yf) 
and let k t be the semi-invariants of the distribution of ?n . Comparing coefficients 
in equation (1.7), we obtain 

(1.8) Kr, = f?K,+, (f + S > 3) 

and from this equation again 

(1.9) Krs = dKr+l.«—1 ( r + S > 3, S > 1). 

If at least one k„ with r + s > 3, is finite and different from zero (which 
implies that the joint distribution of y 1 and j/ 2 is not normal), /S may be deter¬ 
mined from one such equation given the joint distribution function of y x and ?/ 2 . 

1 3. Remarks on the history of the identification problem. The identification 
problem has been discussed, m various terminologies and formulations, by 
quantitative thinkers in several fields It is interesting to note that most of the 
contributions have come from researchers whose main attention was directed 
to particular fields of application For this reason, perhaps, its general formula¬ 
tion was not attempted until recently. 

In economics, contributions of increasing explicitness and generality were 
made by Pigou [18], Henry Schultz [20], Frisch [3], [4], [5], [6], [7], Marschak [17]. 
The main contributions to the formalization and explicit mathematical analysis 
of the problem were made so far by Haavelmo [9], Koopmans and Rubin [15], 
Wald [24], and Hurwicz [10]. 

In his books on factor analysis [21], [22], Thurstonc discusses m several places 
questions of identifiability.' Pieviously the lack of idcntifiability m a certain 
factor analysis model had been demonstrated by numerical examples by G. H. 
Thomson [27] Models used in the analysis of latent structure in attitude and 
opinion research by Lazarsfcld [10] give rise to similar identification problems. 
In biometrics, the “method of path coefficients” of Sewall Wright [25], is essen¬ 
tially a method where a structure is postulated behind the observable distri¬ 
bution, and the identifiability of that structure discussed. The identification 
problem is also met with m the theory of the design of experiments, particularly 
m the method of confounding (Fisher [2], Chapter 7, Yates [26]). When con- 



168 


T. C. K00PMANS AND O. REIER80L 


founding is used, the identifiability of certain parameters (second order inter¬ 
actions, say) is sacrificed in order to gain certain advantages in the testing of 
hypotheses concerning (and in the estimation of) the parameters that remain 
identifiable (main effects and first Older interactions, say). 

2. General formulation of the identification problem. 

2.1. Latent variables, observed variables, and structure. In each of the examples 
considered m this article, the distributional specification applies directly to 
certain non-observable or in any case non-obseivod vaiiables, variously referred 
to as errors of observation (like Ui and u 2 above), disturbances, “true” variables 
(like rji above), specific factors, etc We shall refer to these as latent variables, 
denoted by a vector u. Ih addition, certain structural relationships —like (1.1) 
and (1.2)—are specified which connect the latent variables with the observed 
variables, denoted by a vector y The specification is therefore concerned with 
tne mathematical forms of both the distribution of the latent variables and the 
relationships connecting observed and latent variables 

The term “mathematical form” carries a suggestion of parametric specification 
which obviously is not the only possible type. We shall therefore employ terms 
and concepts introduced by Hurwicz [10] which cover both parametric and non- 
parametnc specifications. By a structure S = (F, <p) we understand a particular 
probability distribution function 

(2.1) 2?(u) 

of the latent variables—thought of, if you wish, as given numerically to a 
desired degree of accuracy, either by a cumulative distribution surface or curve 
or table, or parametrically by numerical values of the parameters—combined 
with a particular structural relationship (or set of simultaneously valid rela¬ 
tionships) 

(2-2) <£( y , u) = 0 

between observed and latent variables—again given numerically by curves, 
surfaces or parameters—which permits unique determination of the observed 
variables y from the values of the latent variables u (except possibly for a set 
of w-values occurring with probability zero) The corresponding probability 
distnbution 

( 2 - 3 ) H(y 1 S) 

of the apparent variables is therefore uniquely determined by the structure *S\ 
and is said to be generated by S. 

2.2. Specification of a model. We shall use the term model to signify a set of 
structures We can thus say that the specification problem is concerned with 
specifying a model <3 which by hypothesis contains the structure S generating 
the distribution H of the observed variables. 


4 A set will be denoted by a German character corresponding to the Latin character 
denoting its representative element. u ca ac ,er 



IDENTIFICATION OF STRUCTURE 


16!) 


As a result of this reformulation of the specification problem, a new problem 
of inference arises, which logically precedes all problems of estimation or of 
testing hypotheses. It has already been deduced from the definition of structure 
that a given structure S generates one and only one probability distribution 
H(y | S) of the apparent variables. However, statistical inference from any 
number of observations can relate only to characteristics of the distribution of 
the observed variables. The limit of statistical inference is an exact knowledge 
of this distribution function, a limit not attainable but approachable if very 
large samples can be taken. Anything not implied in this distribution is not a 
possible object of statistical inference. 

2.3. Identifiability of structural characteristics by a model. It is therefore a 
question of great practical importance whether a statement converse to the one 
just made is valid: can the distribution H of apparent variables, generated by a 
given structure S contained in a model ©, be generated by only one structure in 
that model? This is by no means implied in the definitions given, and it is not 
generally true. Whether or not it is true m a particular instance depends—as 
illustrated in our examples—always on the model ©, and often on the given 
structure S besides If it is true, we shall say that the model © identifies the given 
structure S, or that the structure S is identifiable by the model. 5 

If a structure S is not identifiable by a model ©, some of its characteristics 
may still be uniquely determinable By a structural -parameter 6{S) we under¬ 
stand a functional of the structure S (This definition applies, of course, equally 
to the case of non-parainetnc specification of the functions F, <t> defining the 
structure.) We further define that two structures S and S* are (observationally) 
equivalent if they generate the same distribution of observed variables, 

(2.4) H(y | S) = H(y | S*) for all y 

We then say that a model © identifies a parameter 0(8) in a structure Sq , 
if that parameter has the same value in all structures S* , contained m © and 
equivalent to So . This definition can obviously be extended to characteristics 
x(S) of a structure S, other than parameters, such as the functional form of a 
relationship represented by a component of the vector etc, 

2 4. The identification problem. It has now become clear that our reformulation 
of the specification problem has given rise to a new group of identification prob¬ 
lems: to determine which of the parameters or other characteristics of a given 
structure are identifiable by (or “within") a given model. 

It is perhaps premature to attempt assigning to identification problems a 
definite place in a classification of statistical problems such as was undertaken 
by Fisher. One might regard problems of identifiability as a necessaiy part of 
the specification problem We would consider such a classification acceptable, 
provided the temptation to specify models in such a way as to produce identifi¬ 
ability of relevant characteristics is resisted. Scientific honesty demands that 

6 The concept here designated briefly as “identifiability” has been called “unique 
identifiability" m another context (Koopmans and Rubin [15], also Hurwicz [10]) in con¬ 
trast with “multiple" or “incomplete” identifiability. 



170 


T. C. KOOFM VN'.S AND 0. REIKHSOI, 


the specification of a model ho based on prior knowledge of (lie phenomenon 
studied and possibly on criteria of simplicity, hut not on the desne for ldoutifi- 
ability of characteristics in which the rescaicher happens to be inteiesfed. 

Identification problems are not problems of statistical mfeieiice in a strict 
sense, since the study of idcntiluibilify proceeds fiom a hypothetical exact 
knowledge of the probability distribution of observed variables rather than 
from a finite sample of observations. However, it. is clear that the study of 
ldentifiability is undertaken in order to explore the limitations of statistical 
inference. 

2 5. Identifiability is subject to statistical test. Fuither intei pcnol rat inti of the 
pre-statistical analysis of ldentifiability with problems of statistical infeicnee 
proper arises from the fact, amply illustrated by our examples, (hat the ldcnfi- 
fiabihty of a structural characteristic, x(S) often depends not only on the model, 
but also on the given structure S Thus, each structural eliaraet eiistie x divides 
the model © exhaustively into two mutually exclusive subsets of stuictuves 

(2 5) <2 = © x + '2 x 

(of which one may he empty), such that x(S) is uniquely identifiable in S„ by 
the model if So belongs to © x , and not uniquely identifiable if S„ belongs to 2 t . 
We shall call x(S) uniformly identifiable by 2 if © x coincides with 2. 

The subdivision of ©.into © x and © x has an important, pioperty: If <5'o belongs 
to © x , then all structures 8* equivalent to iS# also belong to ©, x , and a similar 
statement holds for © x . This property follows directly from the definition of 
ldentifiability of x(S) given above. Its meaning is that the ulenlilialnhty of 
x(S) in So depends only on the distribution of II(ij) - 11 (y \ »5' 0 ) of observed 
variables generated by So, To the subdivision of the model corresponds an 
exhaustive subdivision 

(2.6) § = £* + 

of the set 

(2 7) § = §(<©) 

of all distribution functions H(y \ S ) generated by the structures .S’ of ©, into 
the subset ,*£>* containing those distribution functions II. (y | ,S) generated by 
structures S m which x(S) is uniquely identifiable, and the subset tp x containing 
functions H{y | S) generated by structures for which the opposite is (rue. 

Hence, whenever the identifiability of x(S) cannoL be decided in tile .same 
sense (affiimatively or negatively) for all structures S of © as a result of either 
© x or @ x being empty, then the identifiability of the characteristic x(S) of 
the structure S generating the observations is a property of the distribution 
H(y | S) of the observations. This identifiability is equivalent to the hypothesis 

(2-8) H(y | S) belongs to $Q X , 



IDENTIFICATION OF STRUCTURE 


171 


which is in principle 0 subject to statistical test under the maintained hypothesis 

(2 9) H(y | S ) belongs to ip. 

2 6. Testing particular specifications. Often the model is defined by one general 
specification supplemented with a number of particular specifications which are 
"detachable pieces” m the sense that they can be removed, added or replaced 
by alternatives to construct alternative models. We may define the general 
specification as a set <3 of structures which is postulated to contain the model 3' 
in question as a subset Particular specifications can then be defined as subsets 
3i, 3 2 , - • • of 3 of which the model 3' is the intersection 

( 2 .10) 3's 3 n 3! n 3 2 n • ■ •. 

An example is that of parametric specification of the “form” of the functions 
<jy(y, u) defining the structural relationships and of the distribution function 
F{u) of latent variables as the general specification, and specifications of the 
values of certain parameters of <j> and F as particular specifications. 

In such situations, it is an important question whether a given particular 
specification is—again m principle—subject to statistical test. Whenever the 
answer depends on the other particular specifications, we may ask further which 
minimum set of other particular specifications must (together with the general 
specification) be entered into the “maintained hypothesis” in order that that 
given particular specification be subject to statistical test A formal answer to 
this question, facilitating specific answers in each concrete case, can be given 
as follows. 

Let a model 3 be narrowed down to an alternative model 

(2.11) 3' = 3 f) 0! 

by a particular specification 0!. This particular specification will be called 
observationally restrictive if the set ip (3') of all distnbution functions FI(y | S') 
of observed variables generated by the structures S' of S' is a proper subset 
of the set ip(3') of all distribution functions H{y | S) generated by the structures 
jS of 3. A statistical test of the particular specification 3 t can then be constructed 
by choosing as the hypothesis subject to test 

(2.12) H(y ) belongs to §(3'), 
and as the maintained hypothesis 

(2.13) Il{y) belongs to ip(3). 

The particular specification 0! remains subject to test if the model 3 is stripped 
of such other particular specifications which arc not necessary for the observa¬ 
tionally restrictive character of 0! , although of course the outcome of the test 
may become either less or more certain as a result. 


* See sub-section 2 7 below. 



172 


T. C. KOOFMANK AND (). KKIK11.S0I, 


A frequent case of an. observationally restrictive, specification is that wheie a 
parameter d(S) already identifiable in almost all structures .S of 2, is icstrieted 
by ©i to a prescribed value (or to a prescribed point set not containing all 
points of its domain for all & of S) In tins ease, the specification in question 
has been called overidentijying. 

2 7. Remarks on the testing of hypotheses, In subsections 2,o and 2.11 we have 
without further inquiry applied the expression “hypothesis in principle subject 
to test” to any hypothesis which narrows down the, set, of distribution func¬ 
tions H generated by structures of the model to a proper subset \i'. It will be 
clear that, to make a test actually possible, cannot, be allowed to lie every¬ 
where dense in fQ. For instance, if § is defined parametrically, a hypothesis 
restricting §' to rational values of the parameters is clearly not subject to statis¬ 
tical test. Just what set-theorctical requirements on tp' are needed to make a 
test possible is a separate problem which we shall not attempt to discuss. 

We have also m another sense oversimplified the problem of testing particular 
specifications. In practice this problem presents itself as the. choice of one out 
of many possible combinations of several particular specifications, rather than 
a number of separate and unconnected choices between the rejection and the 
adoption of each particular specification under consideration. Present theory 
of choice between two alternatives does not meet this situation. 


3. An econometric example. 7 

In econometric studies 8 economic fluctuations have been described by a system 
of difference equations in (observed) economic variables ?/, subject to two kinds 
of outside influences, emanating respectively from (observed) exogenous -i.e., 
non-economic—variables z, and from (latent) random disturbances u. Each of 
these equations is given a definite meaning m terms of economic behavior. There 
may for instance be equations explaining respectively consumption expenditure 
(from incomes of various groups, price changes, etc.), the supply of consumers’ 
goods (from price margins between such goods and their raw materials and labor, 
productive capacity, etc.), investment expenditure, the supfdy of capital goods, 
etc The purpose of the identification discussion is to investigate whether, on 
the basis of given a priori knowledge as to the form of these equations, and in 
particular as to what variables occur in any designated equation, procedures of 
estimation or testing of hypotheses can be directed to the parameters of the 
equations of economic behavior themselves, rather than to the parameters of 

secondary ’ equations dependent on (derivable from) two or more of the be¬ 
havior equations. 

In the case of linear systems of equations, a possible form for the general 
specification (the model ©) is as follows. 

(3.1) B 0 y'(t) + B i y '(t - 1) + ■ • • + B Tm „y'(t - r m „*) + IV(Q = u’{l) 

\ Z 2 eXpo f Slt0ly T di ~* ° f identification problems in econometric models see [14] 
See, for instance, J. Tinbergen [23] and L. R. Klein [12]. 



IDENTIFICATION OF STRUCTURE 


173 


represents the structural relationships. Here y'(t), z'(t ), u'(t) are column 
vectors (the transposes of row vectors) of G, K and G elements, respec¬ 
tively, for each discrete time point or period t ~ 1, 2, • • • , T, also t = 0, 
— 1, ■ • ■ , 1 — Tmm , for y'(t) Ba, Bi, • ■ ■, B Tmivi are square matrices of 
order G, and T is a matrix of G rows and K columns 

(3.2) Bo is non-singular, 

(3.3) The observed values z(t), t — 1, ■ • • , T, are held constant in repeated 
samples, and the components of z(t) are linearly independent. 

(3.4) The components of u(t ) have a joint distribution function F(u) (with 
zero means and finite variances) which is independent of t and of z(t). 

(3.5) u(t) and «(£') are independently distributed if t ^ £'. 

Particular specifications @i, © 2 , • • • , that have been most frequently em¬ 
ployed indicate prescribed values (usually zero) of specified elements of the 
matrix 

(3.6) A s [B 0 Bi • • • B TaM T] 

or of given linear functions of the elements of the g th row a(g) of A, for each 
value g = 1, • ■ • ,Goig. It can always be arranged that of the linear restrictions 
on any one row of A, at most one is non-homogeneous (normalization rule), the 
others homogeneous. The homogeneous restrictions state which variables enter 
into each equation, and possibly with which ratios between some of their co¬ 
efficients. 

It has been shown [15] that in the model ©, a necessary and sufficient condi¬ 
tion for the equivalence of two structures S = {F{u), A} and S* = [F*(u*), A*} 
is that they are connected by a linear transformation 

(3.7) A* = TA, u'* = T u', 
with non-singular matrix T. By definition, the model 

(3.8) = © n n <0 2 n - • ■ 

identifies a parameter a„k if, whenever A and A* belong to equivalent structures 
iS and S*, respectively, of we have 

(3.9) a* k = a ok . 

In order to attain such identifiability by linear restrictions on the g tb row of A 
it is necessary that one non-homogeneous restriction (normalization rule) on 
the g th row of A be specified in Recalling that G represents the number of 
rows (and the rank) of A, it can be proved that it is further necessary for the 
simultaneous identifiability of all elements a ck , k = 1, • • , K, in the g th row 
a(g) of A, that at least G — I additional non-homogeneous restrictions be im¬ 
posed on that row, say 

(3 10) = 0, p{V(g)} 2: G ~ 1, 



174 


T. C. KOOPMANS AND 0. KKIKRRpU 


where a(g) == [a 0l • • • «„*], the are Riven matrices (often with elements 
0 or 1 only), and p(X) denotes the rank of X. These restrictions (3 1<A are also 
sufficient (in addition to the normalization rule I if 

(3.11) p(M>'(ff)l = a - 1. 

The row of the "rank criterion matrix” -W{g) in (3.11) consists of zeros only, 
because of (3.10). Therefore, (3.11) requires the other rows of that matrix to 
be linearly independent 0 

Thus, even if the model S' includes, besides a normalization rule, the neces¬ 
sary condition (3.10) for the idenlifiability of the r/ ,h behavior equation, such 
identifiability is still absent in certain structures, corresponding to a point act 
(generally of measure zero) in the space of the coefficients of the remaining equa¬ 
tions, viz., the point set in which (3,11) is not satisfied. Whether or not A actually 
falls within this point set is, as was stated before in more general terms, a prop¬ 
erty of the joint distnbution function II(y | z) of the observations y, and is 
therefore subject to statistical test. In the present, case, this is also seen fiom 
the fact that the rank of is preserved by the transformation (3.7), and is 
therefore itself an identifiable parameter. 

For certain scientific purposes explicit knowledge of A is unnecessary. One 
such purpose is "prediction without change in structure,” i.e,, prediction of a 
value of y{t) for a future time t from a hypothetical value of z(l) on the assump¬ 
tion that A and F{u) have not changed between the observation period and tin* 
time point to which the prediction applies. Such prediction can lx*, hosed on 
the knowledge of (a) the population regressions 

(3 12) y'(t) = U iy '(i - 1) + • • • + n Tmx y'(l - w ) + II t z'(l) -1- v'(t ) 

of the "jointly dependent” variables y{t) on the "predetermined” variables 
y(£ — 1)» ■ • i y(t - T m «), z(t) and of (b) the distribution function K (u) of 
the population residuals 

(3 13) «(0 = V(t) - E{y(i) | y(t - 1 ),■■■ ,y(l~ w), z(t)\ 

from these regressions. Of course, the matrices "II” are functions of the struc¬ 
tural parameters (3 6) through 

(314) [-1 n] s [-/ rif - n T „„ nj = -iCa 

and K{v) can be derived from F{u) through the transformation 
(3-15) v' = BjV. 

Ihe important fact is that II and IC{v), by their definitions, depend only on the 
distribution function II(y | z) of the observations, and are therefore uniformly 
identifiable This is also reflected in the fact that the right hand members of 
(3.14) and (3 15) are invariant for the transformation (3.7). 


s In that case, ovendentification of a{g ) will result if the inequality sign in (3.10) holds. 




IDENTIFICATION OF STRUCTURE 


175 


However, the most relevant economic problems are those in which a change 
in A or F(u) is actually or hypothetically present, and in which therefore the 
identifiability of the relevant parts or functions of A and of the characteristics 
of F(u) requires separate inquiry. 10 

4. An example from factor analysis. 11 Factor analysis has been presented in 
different forms by different authors We shall here consider the multiple factor 
analysis of Thurstone only [21], [22], 

The factor analysis methods were developed primarily for the purpose of 
analyzing intelligence tests, but they have also been used for other psychological 
problems and in other sciences. 

Suppose that a person is given a battery of G tests. Let his score in test i 
be y,. The fundamental assumption in factor analysis is that these scores can 
be explained in terms of a relatively small number of hypothetical primary 
factors. Let z \, z 2 , ■■ , 2 , denote the hypothetical scores of the person in the 
common factors, i e., those primary factors which are common to at least two 
tests in the battery. We assume that y, is a homogeneous linear function of 
the scores z K plus a unique part v l , which may be thought of as consisting of 
an error term plus the contribution of a specific factor. The coefficients ir,t in 
the linear function jtist mentioned are called factor loadings. The factor loading 
7 r,t expresses the relative importance of the common factor k in the answering 
of test x. 

We shall introduce the row vectors y = [y,], z = [zj, v = [»,] and the matrix 
n = [tt,*,]. The covariance matrices of the sets of variables y, z, and v will be 
denoted by M„„ , M„ , and A, respectively. 

In contrast with the preceding example, the variables y are the only observed 
variables. The variables v and z are latent variables. 

Our model will be given by the following specifications: 

(4 1) y' = Hz' + v'. 

(4.2) E(z ) = 0 and E(v) = 0. 

(4 3) The set of variables z is stochastically independent of the set of variables v. 

10 See Hurwicz [11], 

11 Proofs of the statements in thiB section will be found in a eeparate paper by one of 
the autliois (Reiersdl [10]). It should be noted that the notation is different in the two 
papers In the separate paper the notation is close to that of Thurstone. In the present 
paper the notation lias been chosen to coirespond in some way to the notation in the econo¬ 
metric example A list of corresponding symbols in the present paper aSd in Thurstone’s 
books follows. 

Present paper 1 /, zt ir,j 0 p M fI A 

Thurstone s, x m a im n r Ri R pt Ri—R 

It should be noted that M„„ , , and A are covariance matrices of the original variables, 

while Ri , R p , , and R are covnuance matrices of standardized variables. 



176 


T. C. KOOPMANS AND O. UKIBHSp'D 


(4.4) A is diagonal and different from 0. 

(4.5) The elements of z and v are jointly normally distributed. 

(4.6) Each i/,- is correlated with at least one of the other j/’k. 

(4.7) The rank of II equals the number p of its columns. 

(4.8) M,« is nonsmgular. 

(4.9) pis the smallest number of variables z which is compatible with the joint 
probability distribution of the observed variables ?/ and specifications (4.1)- 
(4.8). 

(4.10) Each column of II contains at least p zeros (in unspecified places), 

(4.11) A normahzation rule fixing the units of the variables x and a rule fixing 
the order of the columns of II. 


Denote by Ilk the matrix consisting of all the rows of II which have a zero in 
the fc th column. Let the number of rows in the matrix lb be . Let II*, denote 
the submatrix of II* which we get when deleting the t‘ h row of II* . Using these 
notations we shall formulate the final specification of our model. 

(4.12) The rank of each of the matrices IT*, (fc = 1, 2, • • • , p; i ■ 1, 2, • ■ • , p*) 
is p - 1. 

Specification (4.1) represents the structural relationships 
Specification (4.10) means that the experimenter thinks he can construct a 
sufficient number of tests where at least one of the common primary factors is 
absent. 

We shall first consider a model © containing Specifications (4.I)-(4.9) only. 
From (4 9) follows that p is uniformly identifiable. 

Let p Q = \(2G -f 1 — V8G + l). If,p > p a , the matrix A is generally not 
identifiable If p < Pa , A generally is identifiable When p ~ p„, the number 
of values of A, which correspond to a given covariance matrix M vv , is usually 
finite, and may be equal to one or greater than one. The matrices II and M„ 
are never identifiable in the model ©. If A is identifiable, the. set of all stme- 

tures {II*, , A} equivalent to the structure (n, M„ , A} is given by the set 

of all matrices 

(4-13) II* = nff' 

and 


(4J4) M* = 'k -1 M„('F)~\ 

where L is any square, p-rowed and nonsmgular matrix. 

In the following we shall confine our discussion to the case p < Pa , and to 
structures m which the matrix M w is such that A is identifiable in ©. 



IDENTIFICATION OF STRUCTURE 


177 


We shall now consider the model & defined by Specifications (4.1)-(4.11). 
In. this model a necessary and sufficient condition for the identifiability of II is 
that any square p-rowed minor of II which is of rank p — 1 is contained in one of 
the matrices II*. This condition excludes the possibility that all elements be¬ 
longing to the intersection of p — 1 rows and two columns of II are all equal to 
zero. In order to be able to use this result, the experimenter would have to be 
able to construct teste where one, but not more than one, common factor would 
be absent. Therefore the result is not particularly useful. In order not to exclude 
the case where two common factors occur in more than p — 2 tests, we have in¬ 
troduced Specification (4.12). 

We shall finally consider the model <5" defined by Specifications (4.1)~(4.12). 
Assuming M w known, we can determine some value II* of II which satisfies Speci¬ 
fications (4.1)-(4.9). Since, by assumption, A is identifiable in ©, II* must be 
of the form ITT, where II is the true factor loadings matrix and T is non-singular. 
Let n* be a submatrix of n* containing all the columns of n* and satisfying the 
following conditions 

(4.15) The rank of n* is p — 1. 

(4.16) The addition to II* of a row contained m II* but not in II* increases 
the rank to p. 

(4 17) Each submatrix of IT* obtained by deleting one row of n* has rank p — 1. 

A necessary and sufficient condition for the identifiability of II in the com¬ 
plete model <£■" is that there exist exactly p submatrices II* of II* which satisfy 
conditions (4.15)—(4.17), and that the p vectors g* , satisfying the equations 
IT* q k — 0 when k = 1, 2, • • • , p, are linearly independent. 

It should be noted that Specifications (4.10) and (4.12) are observationally re¬ 
strictive, i.e., they are in principle subject to statistical test. 

5. A comparative discussion of the examples given. Some comparative re¬ 
marks on the three examples given in sections 1.2, 3 and 4 may illustrate our 
general discussion of the identification problem, given m section 2. 

In each of the three examples considered, the model contains a general speci¬ 
fication prescribing a parametric form of the structural relationships (2.2). 
Further particular specifications therefore take the form of parameter specifica¬ 
tions in the function <t>(y, u) in (2.2) and possibly m the distribution function 
(2.1) of latent, variables. A comparison of the three examples shows a striking 
formal similarity of the identification problems to which they give rise. This 
similarity justifies our speaking of identification problems as a separate group 
of problems preparatory to statistical inference, of quite widespread occurrence. 
The same definitions of structure, model, parameter, identifiability are applicable 
and useful in each example In all three cases, parameters occur, the identifiability 
of which depends on other identifiable structural characteristics (the normality 
of a distribution function in one case, the ranks of parameter matrices in the 
other two cases). 



178 


T r. Knnr.Mis- \vn n, ia.irn.-on 


Oui remaining remarks will be diawn from the m'nii'mvlui' and factor iiualvt i,-. 
examples only, partly because IIicm* llln,-trail* tin* i«i'nsinn pr-ibbtn in 
greater elaboration, partly because the closer similarity of thiv-t*<-\amplt'sjM‘imitK 
us to notice interesting differences in greatei detail. 

Let us consider the particular case <if (lie eeuiiemeirie example when then- are 
no time lags between the i/’s in the Mrucfuiul lelatmn. hips i e , when r, ,,, 0. 

In this case the reduced form (11.I'd) m the enm«ain'trie evimple is uf fhewtuie 
foim as equation (4.1), which defines the sfriielural relation -hips m the faetur 
analysis example The notation in the factor analysis example ha,- been ehinen 
with this similarity in mind. However, it should lie empha'■!?,■ si that, while the 
variables y are observed in both examples and the variable-, r are latent in both 
examples, the variables z are observed in the econometric example and latent in 
the factor analysis example, and even the number of variables _• is an unknown 
parametei p in the latter example. For 11ns reason, the discussion of tin* ident in¬ 
ability of A in factor analysis lias no counterpart in the econometric model. 
Furthermore, the identifiability of the matrix II, which is automatic and uniform 
in the econometric model S,, say, requires detailed sjieciheationh in the factor 
analysis model <S y , say, including the. diagonal'll,v of A and pie-eriptirms about 
the number of zero elements m each column. 

The observability of 2 in the econometric, ease is exploited to postulate, behind 
the reduced form (3 12), a .structure jF(w), a} to lie identified (where possible) 
from further specifications biased on economic theory. Here we meet with another 
analogy, with differences, between the identification problem or A in 2, and 
that of n (given A) m 2 y . In the latter problem, the set of mat rices II*, belong¬ 
ing to a set of equivalent structures, is given by equation (1.13). This equation 
is analogous to the first of the equations (3,7) in the econometric ease, with II in 
2/ now corresponding to A' in0,, 

If we were to specify zeros in assigned places in the factor loadings matrix If, 
and to introduce a normalization rule for each column of II, the results quoted 
in the econometric example would immediately be, applicable, to the factor analysis 
case. A necessary condition for the identifiability of II, given that of A, would Ik: 
that the number of specified zeros in each column of IT be at least p - 1 . Necessary 
and sufficient for identifiability would be that the matrix consisting of all rows of 

n which have specified zeros in the k lh column, be of the rank p I, for each 
value 01 /c. 


However, instead of specifying that given elements of II he. equal to zero 
Thurstone assumes that we know that there is a certain minimum number of 
zeios m each column, but that we do not know which particular elements are 
zero. The specification of a certain number of zeros in undcsignated places ob¬ 
vious y represents a weaker assumption than the specification of the same number 

op^™r aedpla r, ? 1S thcrefore not uprising that the specification 

abilitv of H Thus a^ ^ C ° lumn is never sufficienL for iflcntili- 

(4 im If ’ in + f m0dd ’ WC have ^reduced the stronger specification 
(4.10). We have seen that even this specification is too weak to be practically 



IDENTIFICATION OF STRUCTURE 


179 


useful, and have introduced the additional Specification (4.12), which makes 
the factor analysis model still more different from the econometric model. 

Continuing the analogy in which A' in ©„ corresponds to II in ©/ , we note an 
important feature common to both examples, and present in other situations as 
well Even if specifications sufficient, in number and variety of “points of ap¬ 
plication,” for the identifiability of all structural parameters cannot be derived 
from a priori considerations, it remains possible to construct uniformly identifiable 
functions of these parameters, knowledge of which constitutes scientific informa¬ 
tion of more limited usefulness. 

In the econometric example we have already seen that for certain purposes a 
knowledge of the uniformly identifiable matrix II of the reduced form is sufficient, 
while for other purposes we need to know the matrix A. As a further illustration, 
suppose that we want to test for persistence of the structure by comparing the 
equation systems which we estimate from data for two different periods. Dis¬ 
regarding errors of estimation (which are not our present topic), if A is the same 
in both cases, n will also be the same in both cases. It is therefore possible to 
arrive at a rejection of the persistence hypothesis by determining II in both cases. 
Suppose next that one row (or several rows) of A are different in the two periods, 
while the other rows of A are identical in the two cases. If B„ changes from one 
period to the other, we may expect each element of Et to change. If we can de¬ 
termine A for each period, the equality (as between periods) of some of the rows 
of A will indicate precisely the extent of validity of the persistence hypothesis. 
If we cannot determine A but only II in each case, this verification will be lost. 

Similarly, it may in factor analysis be sufficient for some purposes to consider 


what we may call the reduced form of II. Let II f be the upper square part of n 
which we shall assume to be nonsingular. The matrix A = n II7 1 will be called 


the reduced form of II. It will be of the form 



. A is always identifiable when 


A is identifiable. 


Suppose now that the same battery of tests is given to two different popula¬ 
tions. Suppose that some of the factor loadings are different in the two popula¬ 
tions, while other factor loadings are the same. If at least one of the different 
factor loadings occurs in the matrix IIi, then each element of An may be ex¬ 
pected to change, and the partial identity of the two structures cannot be dis¬ 
covered if we determine A only and not II. On the other hand, if II is the same in 
both cases, also A will bo the same in both populations. 

Let us next consider two different batteries given to the same population. 
We shall suppose that the two batteries have some tests m common. For each test 
which is common to the two batteries we ought to find the same factor loadings 
in both batteries. In other words, the matrices II in the two cases ought to be 
partly identical. On the other hand, if 11/ contains rows corresponding to tests 
which are not common to the two batteries, the matrices A rJ will be entirely 
different in the two cases. Therefore, again, identification of II will be necessary 
to verify the equality of the factor loadings of tests common to both batteries. 



180 


T. C. KOQFMANb AND O. KEIBKS0L 


A final remark relates to obscrvationally restrictive sperificationfi. Particu¬ 
larly where the model is to a large degree speculative, empirical confirmation of 
the validity or usefulness of the model is obtained only to the extent that ob- 
servationally restrictive specifications are upheld by the data. Thus, Tlmrstcmc 
emphasizes that the number of factors p should he, well below the value p» found 
above to be necessary m general for the identifiability of A, before a factor analy¬ 
sis can be regarded as successful (Thurstone [22], p. 2!>1). 

In econometric work, greater reliance is some times placed on a prion siwcifica- 
tion of the form of a behavior equation, particularly the variables occurring 
in it. If the linear restrictions on an equation in a linear system are just sufficient 
for its identifiability, estimation of the parameters of that equation is possible, 
but none of the identifying restrictions are themselves subject to test. Again, 
dependence on a priori information is diminished (but not eliminated) to the 
extent that a greater number of overidentifying restrictions are imposed and are 
upheld by the data. 


[1] R. 

[2] R. 

[3] R. 

[4] R. 

[5] R. 

[8] R 

[7] R. 

[8] R. 
[91 T. 

[10] L 

[11] L. 

[12] L. 

[13] T, 
[141 T. 
[15] T 


REFERENCES 

A. Fishek, “On the mathematical foundationa of theoretical Htatiatica,” Phil. 
Trans. Hoy Soc., London, Ser. A, Vol. 222 (1922), p. 309. 

A. Fisher, The Design of Experiments, Oliver and Boyd, Edinburgh and London, 
1936 ’ 

Frisch, "Correlation and scatter in statistical variables," Nordic, Slat. Jour., 
Vol. 1 (1928), p. 36. 

Frisch and B. D Mtjdqett, “Statistical correlation and the theory of cluster 
types,” Jour. Am. Slat. Asan., Vol. 26 (1931), p. 376. 

Frisch, “Pitfalls in the statistical construction of demand and Hupply curves," 
Verdfentlichungen der Frankfurter Gee, far Konjunkturforschung, Neue Folgo, 
Heft 6, Leipzig 1936. 

Frisch, Statistical Confluence Analysis by Means of Complete Regression Systems, 
Publ. No. 6, Universitetets 0konomiske InBtitutt, Oslo, 1934 
, Frisch, Statistical Versus Theoretical Relations in Economic Macrodynamics, mimeo¬ 
graphed document for League of Nations conference, 1938. 

' w “Cerent relations between random variables,” Proc. Roy Irish Acad., 

Vol. 47 (1942), Sect A, No. 6. ' 

Ha ™ 0 ’ ‘‘ The P fobabilit y approach in econometrics,” Econometrica, Vol. 12 
(1944), Suppl. 

ITunwicz, “Generalization of the concept of identification,” Statistical Inference in 
Dynamic Economic Models, Cowles Commission Monograph 10, New York, John 
Wiley and Sons, 1960. 

Htjrwicz, “Prediction and least-squares,” ibid. 

R. Klein “Economic fluctuations in the United States, 1021-1941,” Cowles Com¬ 
mission Monograph 11, New York, John Wiley and Sons, 1960. 

Anal V™ *>/ Beanomic Time Sene s, Netherlands 
Economic Institute, Publ No. 20, Haarlem, 1937. 

metnc" 125 Pr ° blemS ^ eC ° n ° miC m ° del oonstruction -” Econo ' 

^v^amirt^nn 11 Dl LEIt “ NIK > “Measuring the equation systems of 

Comm „ • AT™ 08 ' ‘ l tallshcal Inference m Dynamic Economic Models, Cowles 

Commission Monograph 10, New York, John Wiley and Sons, I960. 



IDKVni'TCATION OF STRUCTURE 


181 


[1C] P F LAZAnf-rai n, '‘Hit 1 logical and mathematical foundation of latent structure 
analysis ” “The interpretation of some Intent structures.’’ Measurement and pre¬ 
diction, Vol 4, Studies in 1 ’hj rlmlogy of Wurld War II, 1050 

[17) J Maksciiak, "Keonomic interdependence and statistical analysis,” Studies m Mathe¬ 

matical Economics und Econometrics (1042), University of Chicago Press, 1942, 
])]> litre 150 

[18) A. V. 1’n,or, “A method of determining the numerical values of elasticities 

of demand,” Economic Journal, Vol. 20 (1910), pp. 630-040 
[191 0, liKiEHU0r., “On the ldontifiahility of parameters in Thurstonc’s multiple factor 
analyms,” Psychomctrika, Vol 15 (1950). 

[20] H Schultz, Theory and Measurement of Demand , University of Chicago Press, 1938. 

[21] L L. 'PiniitSTONE, 'The Vectors of Mind, University of Chicago Press, 1935 

[22] L L. Thuhstone, Multtple-faclor Analysis, University of Chicago Press, 1947. 

[23] J. Tinbergen, Business cycles in the United States of America, 1919-1932, League of 

Nations, Geneva, 1939. 

[24] A. Wald, “Note on the identification of economic relations,” Statistical Inference m 

Dynamic Economic Models, Cowles Commission Monograph 10, New York, John 
Wiley and Korin, 1950. 

[25] S WiuGUT, “The method of path coefficients,” Annals of Math Slat., Vol. G (1934), 

p. 101. 

[26] F. Yates, "The design and analysis of factorial experiments,” Imp. Bur. Soil Sci, 

Tech. Comm., No. 35, 1937. 

[27] G. H. Thomson, “The proof or disproof of the existence of general ability,” Brit. Jour. 

Psych., Vol. 9 (1919), p. 323. 



SOME PROBLEMS IN MINIMAX POINT ESTIMATION 


By ,T. L Hamits-, .In., 1 \\n K. I, l.nni'.w 
Unmmih/ of Cahfwinti. I‘» d > !• .7 

1. Summary. In tin; piescnt paper tin* problem t.f point e nutatimi : mn 
sidered in terms of risk inactions, without the ni-innun m -Sin limi to imhitt ed 
estimates It is shown Unit, whenever the los- 1 - m'Iim'x line timt m| 1 h « 1 esti¬ 
mate, it suffices from the risk viewpoint to nm.-ider >m!\ iiMiiiandniiin'ed > t. 
mates. For a number of .specilic iiroblems the niiinin.i'. «*'timui»*, <m< fniind ex¬ 
plicitly, using the squared ciror as loss. IVrlniii iomn,i.<\ picdietimi ps.iMeins 
are also solved. 


2. Introduction, The principles most, eonmmnlv applied m the selection a 
point estimate are the principles of maximum likelihood ill A. Fi-Ueij and of 
minimum variance unbiased estimation (Mtnkofu." Both of those principles me 
intuitively appealing, but neither of them can be jin-tilied very well in a sys¬ 
tematic development of statistics. Tins holds aKo foi .-nine iiiodilieations of these 
principles proposed by G. W. Brown [1J, as the autluu himself points out 
In an important eaily paper [2], AVtild mdiented a moie systematic. approach 
to the problem, which he later developed into his general theory of statistical 
decision problems [3, -J, 5], Consider a laiidom wimble A distributed over a 
space IT according to a distribution I'o willifl thl . It is do-iml lo estimato Mime 
g{6) If the value a of X is observed one makes an estimate, si,v/( r), and thereby 
incurs a loss of W\g(d), f(x)] when 0 is the true value id the parameter. We shall 
assume that the loss function is nonnogalivo It then lolloiv.s that, the expectation 
of the loss will always exist (although it may be infinite) The risk av-neialed 
with the estimate / is defined to be the expected loss, as gi\ en by* 


(21) Rj (0) = Eo in g(0), Hr) 1 = [ irh/ftn, ff/jl r/r)' GO. 

J w 

The choice of estimate should then he made according to the risk limction. As a 
particular possibility Wald suggests the use of mimmux estimates, i.e. estimates 
which minimize sup 9 R f (0). 

The mam purpose of the piesent paper is to obtain nmiimax eslnnales for a 
number of specific problems. Only few such problems ha\e boon wmkod out so 
far, the emphasis m Wald’s work leaving been on the general theory In |2] Wald 
obtained the minimax estimate, of an unknown local ton parameter. Stem and 
Wald [6] treated th e sequential problem of estimating the mcuuul a normal dis- 

1 This work was supported in pint, by the Oilier (1 f Naval aid,, 

’AcinaHv, the principle of minimum variants unhmsed -miaiii.n goes l w *lv 0. (hubs. 

I,o CU imi 11E ' ° ! C ° f lbph i' u,rils - H(>1 ' K Bzniui, u ,!<;■ H,‘uU 1,1 0,4,hi, i, 

Leipzg, 1891 and It LFmckett, A historical note on llm method of least Mnia.es’' 
Biometnka, Yol 30 (1950), p 45S. ' ' ' 


182 



MINIMAX point estimation 


183 


tribufion with known variance, and in Ins forthcoming book Wald considers 
as an example the M-queuf ial pn iblem of estimating the moan of a random variable 
distributed uniformly over an interval of length 1 . 

It seems woithwlnh* to ennsider further speeuil problems both because one 
may obtain estimates that in some eases are preferable to the conventional ones, 
and because time example;, thiow some light, on the general desirability of the 
minimax principle As we --hall see below, it does not seem possible to reach 
any definite conclusions mi this latter point, and to obtain a gcneially valid com- 
pausnn between the minimax estimate anil, for example, the unbiased estimate 
with uniformly smallest variance (when such an estimate exists). 

Consider, for example, the pioblem of estimating the probability of success from 
a number of independent truth each of which may be a success or a failure, when 
the loss-function h the Mjuaied enm, Tf the number of trials is one, the minimax 
estimate (as is shown below) is given by f(Y> ~ JY + where X is 1 or 0 as 
the trial is a success nr failuie As is easily seen, this estimate has smaller risk 
than the usual estimate f*(X t = A' whenever 0 07 ■& p S 0.93. On the other 
hand, when flu* nurnhci of trials is large the standard estimate, X has smaller 
risk than the minimax estimate nearly everywhere. The minimax estimate is only 
slightlv bettei m a small interval (entered at p -- •), whose length tends to zero 
as the nurnhci of trials lends to infinity, and is worse everywhere, else. 

For our put post* it is convenient to formulate the problem of point estimation 
as follows fsee in this connection (7j). A random variable X is distributed over a 
space fV aeeording to a disftihuliou /' belonging to a family ft. We wish to esti¬ 
mate q(P i where tj is a function whose domain is IF and whose range is contained 
in some space Vi fin any example \'f is usually a Euclidean space, mostly even a 
one dimensional Euclidean space). An estimate is u statistic f(X) taking on 
values in \'f, We denote by H'ff/C/’), /(.r)j I he loss which insults from making 
Hu* estimate Ji'.r) when /' is the true distribution, and we define the risk function 
of the estimate f by 

(2.3) ///(/•) - iwwn/m 

The problem is to determine/so as to minimize sup ,,,3 R f (P). 

Our principal tool will be tin* following theorem, which is essentially contained 
in Wald’s work but which is not staled lime explicitly. The theorem is a slight 
modihealion of one used for the theory of testing in [3], 

Thkouum II.t Li t ii (oi ( whirr w is a subset of a Euclidean space), be a 

parametric subfamily of It, and let X be a probability measure over w. Suppose that 
f minimi ;/n 

(2.3) f E s Uy(P e )J(X)}d\(0) 

J U! 

and that 

(i) /AH' \<j(Pe), /(.Y)] is constant (say e) for all 0 cw, 

(ii) E,.\Y [,/(/>), f(X)] < r for all P in If 
Then / is a minimax estimate for estimating g. 



184 


J. L. HODC5BS, JIl. \SD K. h. EKHMVNN 


Proof. Let f be any other estimate of g. Then 

sup E F W{g(P),f(X)} = f K t \V\g(P t ),f(X)]dm 

p<$ 

(2.4) ^ f E,irlg(P f ),f*(X)Um 

J w 

g sup Ep !%(/>), 

i‘t 5 

We note that if J - is the unique function minimizing (2.3), then the first in¬ 
equality in (2.4) becomes strict, and hence/ is the unique minimax estimate of g. 

Following Wald we shall call the function / that, minimizes (2.3) the Bayes 
estimate of g associated with the a piiori distribution X. As a corollary to theorem 
2.1, we note that a Bayes estimate whose risk funct ion is constant, is u mmiinax 
estimate. 

3. Randomization. In the formulation of tlic problem of point estimation given 
above, the estimate f(x) is assumed to be completely determined by the observed 
value x of the random variable X. In the present section a broader formulation 
of the problem will be considered, in which the estimate corresponding to x may 
itself be a random variable, say T z . This extension is a special cause of the notion 
of randomized decision function introduced by Wahl in'his general decision 
theory. We associate -with each x in VX’ a probability distribution l' z , with the 
convention that when X is observed to have the value x, wc estimate g(P) by 
means of a random variable T x which is distributed according to /<’* Estimates of 
this latter kind we shall call randomized, and the fixed estimates /(.r) 
nonrandomized 

The motivation behind the admission of randomized estimates (or more gen¬ 
erally of randomized statistical decision funtions) is that in some problems of 
statistical inference the performance of the decision function is considerably im¬ 
proved by randomization. It is clear however that the randomized functions are, 
more complicated, and lienee that it is useful to know when tlieir consideration 
is not necessary. Before investigating this question we give the following defini¬ 
tion, which makes precise a sense m which certain estimates maybe omitted from 
consideration. (See Wald [9]). 

Definition For a given estimation problem a class C of estimates will be 
said to be essentially complete with respect to a class D of estimates, if for every 
estimate g in D there exists an estimate /in C such that 11/(1*) g lt u (P) for all 
P in T. If' D is the class of all randomized estimates wc simply say that C is 
essentially complete for the given problem. 

It is clear that if one adopts the lisle function point of view, one loses nothing 
by restricting consideration to an essentially complete class of estimates. In the 
present section we find conditions under which the totality of nonrandomized 
estimates forms an essentially complete class. 



MINIMAX I'OINT ESTIMATION 


185 


For this purpose we nerd ihr notion of convexity. A set 8 in a A-dimensional 
Euclidean spare is said to bo convex if, whenever P and Q are in S, then all 
points on the line segment from P to Q are also in 8. A real valued function f 
defined over a A*•dimensional Euclidean space is said to be convex, if for any 
points (xi, • • • , Xk) and Uji ,•■■,?/*) of the space, and any number 0 < « < 1 
we have 


i • • ■ i **) 

(3.1) 


(l — a)fi(y \, * • • , pk) k 

ip(axi + (1 - a)i/i, ■ • • , ax* + (1 


a)Vk). 


We usb the following notation for conditional expectation. If U and 7 are 
two random variables which have a joint distribution, then E(U\v) denotes the 
conditional expectation of U given that V = v, E(U\8) denotes the conditional 
expectation of U giveuthat V is in ,S\ Lct$(e) - E{U\vj, thenfor $(F) we write 
E(U I V) 

Lemma ,3.1. Let V , V hr two random variables with a joint distribution, such that 
U is distributed in ft k-dimensional space and E(U) is finite. Let be a real-valued 
convex function defined over this space and bounded from below. Then 

wiwmn s mm. 

Proof, The. proof is immediate in the special case that, for almost all v, there 
exists a determination of the conditional probability distribution of U given v 
which is a measure. Wc then know, from the convexity of f, that for almost all 
values i) of V,fi\E{V | u) J g E[fi{U) J ttj. Heplacing v by F and taking expecta¬ 
tions of both sides, we obtain the desired result. 

If we do not assume the existence of conditional measures, the proof is more 
complicated. Since E(U) is finite, (here exists a function E(U | v) such that for 
any set. 8, E(U j S') ~ E\E{U | F) | *S'|; see [10], p. 47. Since is convex it is 
measurable, and since 4> is bounded From below E[f(U) j exists. Excluding the 
trivial ease A’l'KIO) — *■, we know there exists a function E\fi(U) | v} such 
that for any set »S\ E\fi(U) | L’j = E\E\fi{U) | F} | /S'}. 

If the lemma wen false, we should have E[E{<p(U) | Fj) < E{fi[E(U\ F)}}, 
and could find an e > 0 and a set A of positive F measure such that for every 
v f A, E\fi(U) j el -1- 2t < f\E(U | v )|. This implies the existence of a number d 
and a set. B of positive l 1 ' measure such that for every v e B, E[\!s(U) |t>) Si d 
and 1 1 |-tS fi{E(V !*■)}. Since is convex, the domain D of points P for which 
i/,(/>) < d 1- t is convex, and we may find a mibset C of B, of positive F measure, 
for which the set. of points E{1' | v), V e (!, lies in a convex domain E disjoint of D. 
It follows that E((< | <1) lies in E, and hence that f {£?((/ 1C)) S d + t. Clearly 
d E{fi(U) | Cj. Thus we have the contradiction E[i(CJ) | (7) > fi{E(U 1 C)}. 

Definition A loss function IF will be called convex if for every u e W(u, v ) 
is a convex function of the estimate v. 

An example of a convex loss function is provided by the Markoff principle of 
estimation. The variance of an unbiased estimate may be considered as a risk 



J, h. [IlitICil >, ,111. IMi l. l„ tUt'H’i’*' 


function if wo take Dio In.-- function !<t be th<- 1 -pi n< d > ir--r, s o Dm .-quart- of 
the difference between (he true value tpl ,] and tin* - '’v.uuU 1 x.ibm f-/. nr T s ; 
and tlnaloss function is clearly convex. 

Theorem ‘,1.2, If the loss fmniion li’ t> e-w- 3 / "< t .*1 » /. m’, d"ti f • /•-«-*» 
and if wo consider only estimates Inreiity JmiU • /jo< /?,< ij ft. ,,%/ 1 /<*«/<. 

randomized estimates is cssoitinHy inmjih^. 

Proof. Lot Tx he any randomized r-timah .mb Hut /.’ /\ *m>( mid 1 
finite. Applying lemma 11.1 wo we Dial EiT% ,Yt, whi*h« ,1 Dni> tt«>n *d X nidv 
is a nonmnclomized estimate, lias a risk win gns'er Dun Di ■.! ««J 
The restriction in. theorem ,'1.2 In eMuuutes having Inn’*- < qe < (aliuri may Itc 
replaced by Die requirement Dial fur each iid'l lln-io o\i 1 a mimi-ci If, moli 
that if \ v — w| = jl/„ then IPfu, n > Win, -it. With flu- lequurmoji) and the 
convexity assumption, it follows Dint the rbk uwu mf- d v. it 1 1 I\ huiIiiiHo when¬ 
ever E{T X ) is infinite. 

Theorem 3 2 is related to a generalization of ,i Ihcou-m of Hiurkweil. If 1" i- a 
sufficient statistic! for c/(i J ), and if for alino-l all >/ flu* t uiidif inital diifnbnfimi of 
X given u exists in the. sense of measure, we may regard <- -f imaU.m of tp /' 1 bared 
on X ns randomized estimation of tj(l‘) based on Y, and is Du- n-Mimptioiu of 
theorem 3.1 arc satisfied, we may apply this Dieon-m 1.1 eunelude flu* e-M-mi-d 
completeness of the class of nonrandomizcd estimates bawd oil }*, In Dm gem-ial 
case we may resort again to lemma 3.1 to prove the following theorem, Die proof 
is the same us that of theorem 3 2 if A' is replaced by Y throughout. 

Theorem 3.3. If ilia loss function IP is convex, i f \'t is in 11 KnelnUnn 1 pun , if 
we consider only estimates huvintj a finite ei i>reluti<m,nnd if Y is a snffirii nl dntidie 
for If, then the class of nonrandomizcd estimates which art JmiWitms 0 / L 011 / 1 / is 
essentially com-plctc 

Blackwell [11] proved that, if U is a siillieienl. statist it* for a real-valued param¬ 
eter 0, and if T is an unbiased estimate for 0, then E(T t l "J, which is a function of 
U only and also an unbiased estimate for 0, lias a variance which never exceeds 
that of T Observing that the theorems above hold true when we restrict, nl tent ion 
to unbiased estimates, Blackwell's result may he, obtained from theorem 3.3 by 
letting be one-dimensional, letting IP be the squared error, and restricting 
ourselves to unbiased estimates. In a similar manner wo can get from theorem 3.3 
an extension of Blackwell’s theorem given by Barankin [12], who treated tin- 

case m which IF(0, f) = | 0 — f | s , « > 1, It w dear that these loss functions are 
convex. 

If the convexity assumption is removed, theorems 3.2 and 3.3 cease to la- 
true. Tor example, if IT has only 71 points, if c .'/ is a linite lino segment, of length 
greater than 2 n«, and if the loss is 0 whene.vo.r | ,,(P) - f( x ) ] < !Ut d | olherwhe, 
then the mmimax risk among nonrandomizcd estimates is 1. By admitting ran¬ 
domization, however, Die maximum risk can he brought below 1 without, using X 

at all, 1 our estimate T is uniformly distributed over S'/, then the. maximum risk 
will be 1 - a/(length of S() 

The example ]ust given may seem inappropriate, In that with the specified loss 



MINIM VX POINT ESTIMATION 


287 


function I lie problem would customarily be considered one of interval estimation 
rather than point estimation. This objection doeanot apply however to the loss 
functions eoiiKidoied in the following theorem. 

I in.< uu m d 1. I.il ' \ - JO, I, , w}, « > 3. Let ffbc the set of binomial dis- 


hilililwti s rlcjhu<1 hy 


P„(X - x) - Qp*(l - p) M , 0 < V < 1, Let 


C }J be the 


rlos,d hit, mil (0, 1 ] and ,j(P p ) p. Let W(p, l) = | p - t |* f 0 < s < i. Then no 
minima t estimate ran hr nonraiulmmzed, and the class of nonrandomized estimates 
is not rssi nltally rnmplrtr. 

I’m ii n\ For any limirandomizcd estimate /, R,(p), being a sum of products of 
emit muons functions of p, is itself a continuous function of p. The nonrandomized 
rniiiimax risk F less than 1, as may be shown by considering any estimate of the 
following kind:,Ml)) 0, f(n) - 1, and 0 < f(x) < l for all a Hero A,(0) = 
A’/t 11 P, while if (1 < p < 1, R/(p) < maw j p — f(x) | ’ < 1. By continuity 
sup,,- Kiip ) •' 1 

It is easy to see that (here, exists among the nonrandomized estimates a minimax 
estimate, sa xh. I.et the corresponding minimax risk be denoted by M. We know 
that .If - Mipio /, .i Khip) •• 1, it is obvious that, M > 0. Obseive that h( 0) < 

I , since /<(()) , ■ 1 learls lo Hie emitradie.tion AA(O) - | A(0) f > 1. We can ivrite 


/>'*'?>) r /V-Y 


•) • I V ~ A(j-) 


l'+ £ r„(X = x ) -\p~ h(x)\\ 


The second sum has a linite derivative 1 with lespcet to p at p ~ h( 0), while the 
tiist sum increases with infinite speed asp is moved away from /t(0). Therefore 
A 1 /, j/((()) | < M ; and hy an exactly symmetrical argument, 0 < h{n) and 
L‘h\liOi 11 < M . Using the continuity of Hu , we can find a positive number to no 
small that A\(/M < M whenever | p — h(0) | < w or | p — h(n) | < u>. 

Consider now the randomized eslimatc 7V. defined by T x = h(x) if 0 < x < n, 
and by T t lt(.r ) | otY otherwise, where Y is a random variable independent of 
X and taking mi the values 1 and — l each with probability -J-, and where 0 < 
a < ca. Observe 


A’i'v(/)) - itk(p) -- U - r) u u I | V ~ /i(0) + «| * + | V - h( 0) - «|*j - |P - 
/MO) | ] i- /[ill /» - h{n) -H « I* I- | p - h{n ) - «| ‘} - \ p - h(n ) | *]. 


Hy the concavity of the functions involved, the first square braeketted term is 
negative wheticicr | p ■ h{ 0) | > «, and the second is negative whenever 
| p - It (it) | 7* «. We can choose a so small that whenever either | p — A(Q) | 
or | p - h(n) | is less than «, Hr x (p) — A\(p) < w, A continuity argument 
now shows that suposp.-.i Kr x (p) < df. Hut this proves that no minimax esti¬ 
mate, with landonuzation permitted, can be nonrandomized. It is also now 
obvious that (bo class of nonrandomized estimates is not essentially complete: 
o\erv nonrandomized estimate must have a risk function which somewhere ex¬ 
ceeds sup,,- JJ;3 i Kt x (v). 



188 


J. I„ HODGES, JR, A.NI) K. 1.. M HHAS.N 


4, General properties of minimax estimation. Whether a pimeiple Mich as flu* 
minimax principle is a desirable one has to be decided mainly on fun criteria; 

(i) its general properties, and 

(ii) its performance m many particular instances. 

It has already been remarked that m the second respect the minimax principle 
does not seem entirely satisfactory. With regard to the former, one great ad¬ 
vantage of this principle is that when there is a unique minimax estimate, it in 
admissible. Here an estimate / is said to he adinbeihle (see [H|j if there exists no 
other estimate/* such that R/*{P) < ll/(P) for all P in !l with strict inequality 
holding for some P. It is interesting that, as we shall show Itelow, this admissi¬ 
bility property is not shared by either the prineiple of unbiasedness nr the maxi¬ 
mum likelihood principle. 

In this connection we begin by proving another theorem concerning essentially 
complete classes. 

Theorem 4.1. Suppose that the space fl is a finite internal [a, b] on the real line, 
and that for each u « ‘i/, W(u, v) is a non-decreasing function of e when e > u 
and a non-increasing function of v when v < u. Then the class of estimates whose, 
range is contained in C U is essentially complete, with respect to the class of all real 
valued estimates. 

Proof. If T is any real-valued estimate, define T* by 

(T if Tt"!, 

(4-1) r = |a if T < a, 

1 b if t > h. 


It is clear that R T .(P) < R T (P) for every P « IF. 

Halmos [7] has provided an example in which the minimum variance unbiased 
estimate takes on, with positive probability, values outside the range of the 
parameter, It can be shown from the proof of theorem 4.1 that in this ease any 
unbiased estimate is inadmissible, provided the loss function is of the kind 
described in theorem 4.1. 

That the maximum likelihood principle may also lead to inadmissible esti¬ 
mates is easy to show, since this is the case in many familiar situations. The 
following example may be of interest in that here the maximum likelihood 

estimate is uniformly worst among all estimates which one would consider 
using. 


Example Let X be a random variable with only 0 and J as possible values, and 
,, nx - 1 ) _ p . Assume it to be known that $ < p < Then the maximum 
kehhood estimate for p is easily seen to be + 1), and, if the loss function 

f„nnt,nT f d en0r ’ thR a '?f ciated nsk funcUotl is *(P - 'if + This risk 

s »w yu , s i < k 7< 7o 17" Umn “ « E “y ««">“<• /M 

The seleetmn of loss function in any problem should in theory be governed by 

eras do nnf T ^ 10118 ’ but in fact the circumstances of statistical prob¬ 
lems do not usually offer compelling reasons for using one loss function rather 



.MI.VIMAX POINT ESTIMATION 


189 


than another. CmiMderatninK of mathematical facility aic often determining. 
Thus, varinus classical unbiased estimates become minimax estimates when the 
loss function is judiciously chosen, For, if wc take as loss function the ratio of 
squared error to the variance of Iho unbiased estimate, the risk becomes constant, 
and wo can easily obtain the. classical estimates as minimax estimates in the 
familiar binomial, Poisson, and rectangular problems, and in some of the non- 
parnmefric problems considered in section 6. 

However, this approach seems to lie somewhat artificial, and hereafter we 
shall restrict ourselves to a single loss function, namely the squared error. There 
ate two reasons for this choice. With squared error for the loss, the mathematical 
problems are. rather simple. And as was remarked above, squared error (if one 
restricts oneself to unbiased estimates) is the traditional loss function. Fortun¬ 
ately, the squared error loss function is convex, and hence theorem 3.2 permits 
us to avoid considering randomized estimates. 

When the loss function is squared error, we have the following obvious linearity 
property, which for later reference we state, as 

Theorem 4.2. If f( X) is the minimax estimate for <j(P), then af(X ) + b is the 
minimax estimate for a ■ g[P) + b. 

However, as we shall show by an example in the next section, it need not be 
true, that if X s , • • • , A r „ are independent and/,(A,) is the minimax estimate for 
g,(P,) i t "= 1, , u, then <*>/. (A,) is the minimax estimate for 

z;?-. fltP.(jPi)- This is a definite, disadvantage of the minimax principle as 
compared with the Markoff principle which does possess the linearity property 
mentioned. 

We. conclude, this section with an explicit solution of the Bayes problem in the 
squared error case. If the, distribution P is itself a random variable distributed 
over T according to some distribution. X, we may compare estimates / by means 
of their expected low Q(f) - E[g(P) - f(X)f. Since Q(f) = E[E[g(P) - /(A)] 2 | 
X), it is well known that Q(f) is minimized by using the estimate 
/(:r) = E{g(P) | x], provided the conditional measures exist. In fact, this result 
holds even without this assumption. 

Theorem 4.3. E\g(P) - /(A)] 2 is minimized by fix) = E[g(P) | sc]. 

Proof, S\g(JP) - /(A)] J - E[g(P) - E[g(P) |A]} 2 = E[E[g(P) \X] - /(A)) 2 
+ 2 JS[E[g(P) - E[g(P) | Alj \E[g(P) | Aj ~ /(A)) | A] > 0. 

In applications it is convenient to write E[g(P) ) A] more explicitly. Suppose 
that with respect to some measure m over % each distribution P t Thasa general¬ 
ized probability density p F , so that for any A, the probability that A t A com¬ 
puted for P , is given by 

f p P (z) dp{x) . 



190 


3. h. HOIlttl >, 31! 1*01 ' * * flM \ 


Minimizing; a (jiuulrnfic expo sunn rimw re.’ 


(4.2) 


is a Bayes solution. 


[ jir's i d.V/i 

J ft 


6. Binomial and hypergeometric distributions, fu Hn* jiH'-st* *“ Jem we bull 
consider three discrete minimax problema. 

Problem; 1. (Binomial.) lad. X In* a binomial i.inriom v.irnblnwitd ji u.sme'er 

p, 0 < p < 1, so that P(X — k ) - (”^/m,1 p "' 1 . We hull '-how tint >ln* 

minimax estimate for p is 


(5.1) 


X /n , 1 

n (Vn -I- l) 2( \ ii i 1 


Consider any linear estimate aX -1- T The i id; K s <itX id ;i * u h at it- 

. , , .‘Uid 

\ h U 1 \ n' 


function of p which is constantly equal to yi when u 
1 


P “ 2(i+ Vn) 
seen that 


Hence (5.1) is a constant risk estimate of p. Since it Wearily 


fv ■ V' ' d}> 

a-lH » 

V Q -7>9 dp 

0 


a |- /.* 

« ~\~ b j - n ' 


(y l - p), 


it follows that (5.1) is the Bayes estimate when p is distributed with probability 
density C(pg) <Vn/2) ~ I , and hence by Theorem 2,1 we. conclude that 15.1) is the 
mimmax estimate of p. 

After obtaining this result we were informed that it had boon obtained earlier 
by Ii. Rubm, to whom, therefore, the priority belongs. 

It is interesting to compare the risk of the above estimate with that of the 
standard unbiased estimate X/n, Wc have 


E [rrvi 



As is easily seen, ^ if and only if 

V — - > + %\/n 

2 2(1 - f - ■\/yi) 



MIMMAX I'OIN'f ESTIMATION 


191 


Thus the stundmd estimate is better Ilian (ho niinimax estimate outside an 
interval around p - \ whose length decreases with inereading n, tending to 0 as 
n tends fu infinily. However, for very small values of n the minimax estimate has 
the smaller ri-k over nearly the whole range. 

rwim.t.M 2, (Ditfeieneo of binomials.) Let X and Y be independent binomial 


l)l>\ (1 - Vi)"~ k and P(J = i) {n 


random variables, where P(X k) 

j> 2 (1 - pA" 1 • Hy use of theorem 2.1 we shall show that the niinimax estimate for 

pi - 


\/'Jn [X }'\ 

is - I" ). For (he set w of theorem 2.1 we take pi = v, 

1 d- v/2n V n h/ 


Pi — 1 — p, 0 < p < 1, and we let Z — X 4- n — Y. Applying the result of 

Pi ol deni l In Z , we find the, niinimax estimate of p to be a 2n • Z + f3 in , and by 

Theorem 1.2 the minimax estimate based on Z for pi — p 2 = 2p — 1, is 

, and the risk of this estimate is constant over «. 

To prove that this is also the niinimax estimate of pi — p» for the original 
problem, we eonsider the risk as u function of p t and pi . It is easy to show that 
(l d- \ /, 2n)' : IP Pi , ihl ■ 2-!;ii(l - Pi) + /b (1 - 7b)] + (pi ~ Pif. Finally it 
can be damn that 7 / 1(1 — /M - 1 - 7 bU — Ih) is maximized, subject to the condition 
that pi -■ pi be constant, when f p 2 - 1. 

I’mmuEM (Ilypergeometrie.) We finally consider flic problem of estimating 
tlu* number of defectives m a lot from a sample drawn from this lot at random. 
Wo denote by A T anil n the number of elements in lot and sample respectively, 
and by D and X the couosponding number of defectives. For later reference we 
note 


v'2u (X _ I 
l'd- V'2n \» >t 


P (X - k) « 



H(X) 


D 

n N> 


i n,D(N - n)(N - D) 
a * * *“ N Z {N — l) 

As in Problem 1 we easily find a linear function of X whose risk is constant. 
In fact 

A 0 (aA + fi - Df ^ if 


when 


N 


n + 


. /n( N - n ) 

y n - i 




(X = 



102 


j. n. iiODuna, Jit and r. t,. m.hmvnv 


To prove that aX 4- ft is (lit* niiniinax estimate of I) w shall show that it is the 
Bayes estimate corresponding to 

(5.2) P(I) — d) = jf* />V“' • r jr'if' dp, 

where a, b > 0, and 

r - l -‘ *'^ 

' Ffa) V(b) ' 

In this connection it is useful to notice that since is a distribution 

/ N\ r(a + d) T(N + b- d) t’fa 4- h) 


(5.3) 


z w 

d™o v a / 


--- I. 


V{N -|- a + b) I’m) 

Using theorem 4 3, we find the Bayes estimate associated v, if h (5.2) to he. 


m 




• d) 
d) 


Replacing d by (d — a) 4- a, and using the, relation 

(k)(n-k)(d) = (d ~ fc) * “f**’ in vuWiriK rO, 


we find: 


m 


N— ft / » r 

£( 7 

i«0 \ i 


r) 


r( d 4- a 4- 1) r(JV -f b - d) 


xf r - n ) 

l=»Q \ 2 / 


— a. 


r(d + a) r(AT 4 -b-d) 


Now apply (5.3) to numerator and denominator separately; then 

m = & 1 + & + N 4 . - ») 


b 4 b + jj a b n' 
a 4- b 4- N a(N — n) 

a + b + n ~ r+ b -h n " P one obtams ettsil y 


Putting • _ , T ; ■ = « 


a = -J 3 i, = N - an - ft 

a - 1 ’ a - 1 

Substituting the values of « and ft one finds that ft > 0, N > an + ft and that 

1 f P ™ Vlded Ar i > J 1 + 1 In the special case N = n the result is immediate, 
While it iV - 71 4- 1, the result is obtained by giving to D a binomial distribution 

WILLI p = i. 



MINIMAX POINT ESTIMATION 


193 


6. Non parametric problems. W c shall in this section consider estimation 
problems in which the functional form of the distribution of X is not assumed 
known. Itostrietions will he imposed on the variables only to insure the existence 
of estimates with hounded lisk. The problem will be treated under two diffeient 
such restrictions: (i) dial the variables are bounded with known bounds, (u) that 
the, variables have hounded variances. 

In the first of these eases we can assume without loss of generality that the 
variables are distributed over the interval [0,1], and then obtain 
Thbohum 0.1. Let Ah , - • • , X„ La independently distributed over [0, 11 according 
to a joint distribution belonging to a family .T. Suppose that if contains the subfamily 
according to which X y , , X n are independently and identically distributed 

with P(X, = 1) = p, P(A", = 0) = 1 - p, 0 < p < l. Let E(X x ) = , 

- y. a. — a- Then the minimax estimate of p. is 
n i-i 

((i l) r+ 'Vn x + * ) - 

Proof. Rinee (0.1) is the minimax estimate of fi = p when the distribution of 
the XT is known to belong to ft; , we only need to show that its risk is largest for 
the, distributions of ft; . But 

E(AX -)- B - fiY A V t + [B + (A - Dm) 2 = *1 +IB + U- Da? 

71 “ i»i 

and 


SIC(X\) - 


g T■u^ 


20*. - /i)= 


nfi~ g 7i)l(l — ji) 


where, equality holds for the distributions in ft;. 

Coho MARY (1.2. Let X\., ■ • • , X n be a sample from an unknown univariate 
distribution over [0, 1], Then the minimax estimate of E{Xj] = y is given by (6 1). 

(hmoUiAKY (i.T Let A't , ••• , X„ be a sample from an unknown absolutely 
continuous univariate distribution over [0,1]. Then the minimax estimate of E(Xf) = 
M is given by (0 1). 

Corollary 0.3 follows from the fact that any risk function that can be obtained 
for binomial distribution can be approximated by means of absolutely continuous 
distributions. 

Theorem 0.1 can he extended to include variables that are negatively cor¬ 
related. Namely if Xi , • • , X„ are distributed over [0, i] according to a joint 

distribution belonging to some family ff, if for each distribution of if the con ela¬ 
tion coefficient p,j of X, , X, is g 0 for all i, j, and if ft contains the family ff 0 of 
theorem 0,1, then the conclusion of this theorem remains valid. Tins result can 
be used for example in the following situation. Suppose a sample of n is taken 
from a lot of unknown size, and suppose it is desired to estimate the proportion 


p of defectives in the lot. If h is the number of defectives in the sample, it follows 

. . 1 ( k , 1 \ 

from the above remarks that themimmax estimateoE p is ^ ^ y- \Vn 2/ ' 



194 


J. I,. HOIKiJ JR 1MI 5. Iv I.- IIM!'.'. 


It should be pointed out that the re-uli h*>M red'. n n.<iap> i ! 'mud » > umd 

known for the lot size. If it is-known »haJ she muni* i *<i )*• m m ?!»■ be ?. - ,Y , 
then the minimax estimate is that found m -*‘r»iMii ‘i !■« ?!.*■»• i < ? , p»'_'eo 

metric distribution ■with A' = AT. 

Next let us consider c-diinuling tin- dil'lenrnc she in* mi. m *wo 

groups of variables. 

Theorem 0.4. Lei Xj , • ■ * , AT ; }\ , ■ < • . 5\, h, ,mh p. vA u'h; ■! < „ 
the interval [0, 1] according In n joint distriUulitm h foe-;;..*/ to a >'.m, /; 'f ,s y»pe . 
that ff contains the subfamily It) , ticcmLny to n-hu'h A - . • . Y 1 , . 

Y n arc two samples with l‘(X l 1) - jo./' 1 A*. n ! p- ■„ /' 1 ! 

, P(7, = 0) = 1 - jo , tl < p, , jo T l. If I'. .Y, i u. . /-A 5’. .■ . ! ,i 

ii 

2, -in = P, then the minimal r tstimuli 1 of u ■ - ni 
n 

(6.2) v2,L C.Y - fi. 

1 -b V 2u 

Proof. Again, since (0.2) is the ininimux e-Uimaie in the bmumul >.i «• I’t.iio 
lem 2 of section 5), we need only verify that i? • n.-K s- a nmimum ni “t, I bp 

li[A(X - ?) - (p ~ im j" 

= #U(£ - e) - .1(1" - vi : f l ) ■,< 

- 4- a \) + (.4 — lj" (jx - vi , 

of which we already have shown that it is maximized in the l.ummul ra ,e 
TJp to now we assumed the variables to be bounded hot u* mm Mippo e m 
stead that the variances are. hounded. With this assumption we e.ut give an 
analogue of the classical Markoff theorem on least -i|uuh". 

Theorem 6 5. Suppose thatXi , • • • , AT,, an imh pi n<h nth / <h thnlitl an aiihua 
to a joint distribution belonging to some Jamil;/ ,T, which nmlanc Ih• \tihfnittily 
fo where the X’s arc normal with variance M~ Euppnn that for all >//.-■(• ihntunr, in 
J, E{Xj) = Bj and oT, g M~. 1|Y asmune tin malm' in i la he Linnni 

and of rank s T n. Then the estimate [JfX), ■ ■ , f,iX>) uf (P, , • . U, which 
minimizes sup E Y, [/.(X) - 0,f, is the Markoff ex tnnuli 

l 

Proof Consider first the subfamily 1 fi, , Then there exist an ort liugittiul t rails 
formation to Y l , ■ • ■ , Y„ such that E(Y,) - kfl, for i | ..... s, where 
k, > 0; E(Yj) = 0 for i = s -|- 1, ■ • • , a, ami a ], ^ .1/"’ for i , n 

Then (Fi, • • ■ > Y.) is a.sufficient statistic, for (0, , • ■ ,11 t, and it K easily shown, 

using the methods of [G], that ' r \ is the miuiiuax e-timidc for 

. _ Vo A*/ 

( i, , 6 e ) . But this is the Markoff estimate. In older to complete the pionfwe 
must show that the risk of this estimate takes on m % its Mipiemum over ‘S. Hut 

this is immediate, f or Z? XXi CWX) - 0 ,] 2 = - 0,V < J/" A. 

\k, / 



MIMM\X I’lUNT 1,-,'lIM VTION 


105 


In a similar manner ii )-, es-ily shown that the least squares estimate for a 
linear iimrinm of me* or himu- of the it'.-, 1 - the tuirnmax estimate. 

Thi'iuein • i .~f (jur- a jie iiiieufion of the least squares estimate different from 
that "t the M iilaiif tliemem In the Murknfi theoietn, it is slunvn that the least 
sqiiaies est imate ha v‘,ifin-mhj .inallcM ri-k among all linear unbiased estimates, 
time it i- , howu <hnt tie* le;e.t square* estimate minimizes the maximum risk 
min mg al! e .iimaii . (The a Mimpf ion*. concerning variances also differ.) 

7. Prediction problems. Fiequently one is iuteresled in estimating the value 
of a random vauahle lathei than that of a parumctei. A customary method for 
tin-- P to e..timate the expectation of the random variable fa parameter) and then 
to "identify" the vaiiable and its expectation; i.e., to use the estimate of the 
expectation a- a prediction for the variable As we shall see below one is led to 
this pioeedure if one adopt*, the point of view of unbiased estimation, so that 
f i om thi* point of \ iew pi edict ion poses no new pinblem, This however is no 
longer tine when one employs the minuuux ptinciple. 

Consider a pail A'. }' of latidotn \ariables having a joint distribution P 
belonging in a family *» ot dHiihutimis It is desired to use the observed X to 
predict, *av, We are interested in minium predictions, i.e., functions 

/(.VI winch tiiimniire sup,.,;; /iTirfvtJT, f(X)] Tn obtain mininuix predictions 
we need the following analogue of Theorem 2.1. 

'I’m (nit m 7.1. l.rt !/’.d. a ( w In• a pnramitric subfamily of ft, and let X be a 

probability mi asm < urn'vi. Suppose that f is slick that JPoll [(/(F), /(A)] d\(0) is 
minimum , and Unit 

(i) A«trif/I}'!, /t.Yij IS nmslant, nay ~ c , for all 0 t u, 

(iit 7vYn'p;i •' c fur all P e ft 

Thin f is a miniuuu prt diction for g(Y ). 

The pi oof is da 1 exact analogue of that of Iheoiem 2.1, 

Cohou.vuy 7.2 .1 constant risk Payed prediction is a minimax prediction. 

Suppose now that X and 1’ are independent and that W[g{y), f(%)] = 
[,j(y) — /(,, )f. Consider the problem first from the point of view of unbiasedness. 
A prediction could reasonably be called unbiased if Epf{ A) = E P y{Y). Subject to 
unbiasedness, the iisk is given by E P [<j(Y) — /(A’)]“ = o>/(A) + <?p o{Y). 
I lut tr).y{Y) is a known funetion of P, and hence the problem of minimizing 
(for a particular /’) the expected .squared error reduces to that of finding an 
unbiased estimate of Kiii(Y) with minimum variance at P. In a similar way one 
sees, without any restriction to unbiased predictions, that the Bayes prediction 
for y(Y) is tin' same as the Bay we estimate for E P g(Y), and hence that formula 
(I 2), with y(P) replaced by E v y(Y), may be used if the assumptions there made 
are valid. 

One might expect that as in the unbiased theory the prediction will coincide 
with the estimate. This however is not the case since the X’s that give constant 
risk in the two cases will usually be distinct In fact the two problems are rather 



1 % 


J. I, HliDfil 


.IK Wit 


I i i 


armim 
It. 

*!•<>« 


different m that the "least favorable" X fur the pr> dciion pr*'bh<m m-s-,* not 
only take into account the difficulty of linding th«* r.tmi't value t-f >1 f>.r vara 
a priori distributions but also tin* difficulty of predict inn >j V 1 wlum ft 1 . kn»r. 

As a first example consider tin* prediction imuhijnti- of problem ! 0 } 

Let A, Y be independent binomial viruihh* urh that /‘ V 

(^) p l (l “ p) m “* andP(y - It(”)?''(! 7'■ Wr.dtall obtain Mu ■ loiummv 

prediction of Y in a manner quite analogous to the one m which we d* buiuuuil 
the minimax estimate of p. Actually, tlie present pioblem is a g,enoruli/n»inii of 
the earlier one, to which it can be reduced bv let tmg n ■ * >. First U 1 - ea dv t eti 
that 

:;y 


\ Ml 

is a quadratic function of p, which when hi > 1 w const ant for 


m 


m 
1 • 


Vi 


l 

mn 


X 

But we have already seen that a -p ft ih the Bayes solution miii',«>poitdiug to 


Cp a ~ 1 V where a = 


rn 




when 


- , , . , M -- , , . . t ’learlv ,1 

m + a+ I) m + a +■ b a 

a = b, and a > 0 provided 0 < a < 1, which is easily vended when in, n > 1. 
We note that as n —» m t the values of a, ft tend to those of tlit* minimus, cst nuale 
of P. 

X Y\\ . n ■ 1 


( X YV 

« — + /3 — — ) is constant for « - 
m n } 




I 


1, and 


the 


When m = 1, E 
X 

and again a — + fi is the Bayes estimate of a beta diet riltution when it 

fit/ 

hence minimax. 

Finally in the case n = 1, the situation degenerates. Since IC(\ - l'r 
prediction f(X) = ^ has constant risk. In addition it is the Bayes prediction 
corresponding to the distribution which assigns probability 1 to p Hence 
in this case, regardless of the value of X one. would predict for }' Hie value ». 

It is interesting that the above prediction problem can be interpreted also as 
an estimation problem in the following manner. Suppose a lot of size X m f n 
is such that the number of defectives follow a binomial distribution; this is the 
case when the items making up the lot are produced by a manufacturing process 
that is in statistical control. It is desired to estimate from a sample of size in 
aken from this lot, the proportion of defectives in the remainder. Thai this is 

J + pre f ction P robl e*i treated above follows from a remark of 
Mood [13] that m such a lot the number of defectives in the sample and in the 

c22tV Te mdepeM y distributed according binomial distributions with 

1,1 11 n iTirvn T), 



MlXIMAX POINT ESTIMATION 


197 


We can again UM . the binomial results to obtain the solutions of certain non- 
parametnc problems. For example, let, X, , ••• , X m be independently and 
identically distributed on [(1, 1] and let Y x , - • • , Y n be another sample from the 
same distribution Thenjhc minimax prediction for Y is given by al + 0 with 
m f, /l 1 1 I 1 — a i 

“ " m - 1 L V m + n ~ mh J 1 0 “ ~~2~ ' This l folIows from the fact 

that 

E(aX ■!■ (i - Yf ■= V[a(X - p) - (? - M ) + (ft + ( a ~ l) M )] 2 

i + *) + (0 + (« - 1 ) m ] 2 

" **' C + n) m(1 ~ **) + [£+(«- 

An analogous mndilication clearly is possible for theorem G 4. 

For the situation considered in (if), the prediction problem gives the same 
result ns the estimation problem. For consider first two samples Xi, ■ • ■ , 
Ab, ; }’i , • ■ ■ , F„ from a normal distribution with known variance <r\ Here 

I'MfiXr, ••• , AM ~ Yf = E,\J[X ,, •••, x m ) - ef +1*, 

n 

and hence, the risk differs from that of the estimation problem only by a con¬ 
stant. Thus X is the minimax prediction of F, and it is then seen immediately 
that it is also the minimux prediction for Y when of the underlying common 
distribution of the A r, s and F’s it is assumed only that the variance is bounded. 

REFERENCES 

(1] G. W. Brown, "On small sample estimation," Annals of Math. Stat., Vol, 18 (1949), 
p. 514. 

12) A Wald, ‘'Contributions to the theory of statistical estimation and testing hypothe¬ 
ses,’’ Annals of Math. Slat., Vol. 10 (1939), p. 299. 

|3] A. Wald, On the Principles of Statistical Inference, Notre Dame Math. Lectures, 
No. 1 (1942). 

[4] A. Wald, ".Statistical decision functions which minimize the maximum risk,’’ Annals 
of Math , Vol. 46 (1915), p. 265 

|5] A. Wai.d, "Statistical decision functions,” Annals of Malh. Stat , Vol. 20 (1949), p 16S. 
|G) C Stein and A Wald, "Sequential confidence intervals for the mean of a normal dis¬ 
tribution with known variance,” Annals of Math Stat., Vol. 18 (1947), p 427 
171 F. R. IIalmos, "The theory of unbiased estimation," Annals of Math Slat., Vol. 17 
(1946), p. 31. 

|R] K. L. Lehmann and (\ Stein, "Most powerful tests of composite hypotheses I. 

Normal distributions," Annals of Math. Stat., Vol. 19 (1948), p 495. 

191 A. Wald, "An essenlinllv complete class of admissible decision functions,” Annals 
of Math. Stat., Vol. 18 (1947), p. 649. 

(101 A. Kolmocioroff, (Iruntlbcgriffc tier Wnhrschcinlichkeilsrcchnung, Berlin, 1933 

[11] 11 Blackwell, "Conditional expectation and unbiased sequential estimation,” 

Annals of Math. Stat,, Vol IS (1917), p. 105 

[12] E W Barankin, "Extension of a theorem of Blackwell,” Annals of Math. Stat , 

Vol 21 (1950), p 280. 

[13] A M. Mood, “On the dependence of sampling inspection plans upon population dis¬ 

tributions,” Annals of Malh Slat., Vol 14 (1943), p. 145. 



THE THEORY OF PROBABILITY DISTRIBUTIONS OF POINTS 

ON A LATTICE 3 

By P. A'. Kiti-nw hi.u 

rt ihj iij Ih'/iml 

1. Introduction and summary. Tin- papei ill-* u - I la* ilooiy <«l certain 
probability distributions arising I mm pond- or i/utu* d in f hr r».j ni r.t lalfmis 
in two three atul higher ilnneti-ion-. Hi'- point- .nr ul i , fu - which tor 
convenience are described as color, 1 .. t two dimeii-inmd '.liijrr will r.'iml tif 
m X n points in m columns ami n low-. In a fhirr diuicti-tonal lattice then* 
will bo l X m X n points iu I In* form of a lcctsumilm paullelopipol Two 
situations ruise for consideration. Tlmy air, io iw ilir irm\ oi Muh.iluiinhi-., 
free and non-free sampling. In free sampling the rolor of each poini i- drln mined, 
on null hypothesis, indepentiently of (hr rolor of tlm other point-. Tin* pmha* 
bilities of the points belonging to the dilTeintt mlm-, -ay black, white, dr. 

i 

arc j> i, Vi • pi, such Unit£/>, - 1- In nmi-nce - imiihm; tlir numher of 

t 

points of each color is specified in advance, .sty m ,«■■■ • «, sothal mu 

or linn according as the lattice is two- or lliiee-dituou-iorml. t )nlv flu 1 ariatigr- 
menls of these points in the lattice are Mined. 

The distributions considered iu this paper are (lie following: 

(i) the number of joins between adjacent, puinls of (lie -aim* color, say 
black-black joins, 

(u) the number of joins between adjacent points of two specified colors sav 
black-white joins, and 

(in) the total number of joins between points of different mints, along mu¬ 
tually perpeudicular axes. 

The methods used here are the same as those developed by the author [H] 
for the linear case All the distributions tend to (lie normal form when /, m and u 
tend to infinity, provided the p’s are not very .small. 

Before considering the various distributions, we shall have a brief review of 
the work done on this topic by other people For free sampling, Moran |.',j and 
[6] has discussed the distribution of black-while and black-black join- for an 
m X ?i lattice of points of two colors. For a thror-dimon-ional lattice, In* has 
given the first and the second moments for the distribution of black-white 
joins Levcnc [-i] has announced some results closely allied to those of Moiait 
for a square of side N (with N 2 cells) each cell taking the charnel eristic .1 or H 
with probabilities p and q = 1 ~p respectively. Bose |2j lias found the expec¬ 
tation of 

x — the number o f black patches — the number of embedded while patches, 

1 Part of a thesis approved for the degree of Doctor of Philosophy, Oxford University. 

198 



rn< »it vmmty mstiiiiiution.s 


190 


fur :isquare (In nl» <1 into u ' -mall cell-, having p tint! <j - 1 — p as (ha probability 
of (hi 1 rolls being black or while. An embedded white patch is one that lies 
completely inside :i black patch. 

the ahoi e leview Minn:* that the work done mi far is confined on til ply to the, 
free sampling distribution-, the points taking only two characters. Ah mentioned 
in the beginning of this silicic, we cliall deal here with the free and noil-free 
sampling distribution.- for points possessing eharaeters or colors. 


2. Two dimensional lattice. I ah an m X n rectangular lattice consist of inn 
points of /; colors wilh Jinihahilitics p\ , p 2 , • • • pi. , such that Sp r - 1 (When 
there an* only two colors, />, and p, are taken as p and q respectively.) All the 
problems dealt with for the linear lattice (Krishna Iyer, [3]) can he investigated 
here also Hut I he mo-f important of them is the distribution for the total number 
of joins between points of different culms. This takes into consideration the 
relative position of points of all colors in I he lattice. Distributions for the number 
of blaek-hlaek or black-white joins are not, based on the arrangement of all the 
points in the lattice and thcrefou* cannot he, considered to ho. adequate for testing 
the random distribution of the points in the lattice. Therefore the distribution 
of the total number of joins between points of different colors lias been dealt 
wilh in some detail. As the aelunl distributions arc very complicated they 
hip discussed by means of eumulants. The first and the second moments for 
the other distributions lane also been given. 

2,1. First and second moment* for the distribution of black-black joins for two 
or more colors. The fust and the* second moments for free sampling have been 
obtained by Mman |d) and [(!]. In order to give an idea of the methods used 
in this paper for obtaining the moments and also to facilitate the derivation of 
the corresponding moments for non-free sampling, they have been obtained 
again for both lilaek-ldaek and black-while joins 

(a) Free Sampling. In the course of similar investigations on the distribution 
of black-black joins arising from points on a line, the author [3] has found that 
the rlh factorial moment is ?•' times the sum of expectations of the different 
ways of obtaining r joins. This finding is true for the rectangular lattice also 


This may be established as follows. 

Define variates u, } . (i - 1, 2, • • • , n; / = 1, 2, • ■ • , m ~ 1) to be one if 
the. (i, j) and ()',./+ 1) positions are. black and zero otherwise; then E (u,p) = f, 
and the higher factorial moments are zero. Similarly, define, im, (i 1 = 1, 2, • • , 
a - 1; j ■- I, 2, ■ • ■ , m) to be one when the. (i, j ) and (i + 1, j) positions are 
black and zero olherwi.se; then 7i(iv } ) ~ v\ and the higher factorial moments 
are zero. Further, u,y is independently distributed of all ids and ids except 


K.O'-l . 


\i j l‘i—i j 11.;' 


m'+i . h+i.jMi , and v t >, is independently 


distributed of all n’.s and v’s excepting two vertically adjacent v’s and four hor¬ 


izontally adjacent ids. If 


s = 2 u,ji + Siv, , 



200 


P. V. kttlsIlM ni.lt 


then 

m == Z |.* + zy 

S3 (2i/ui -• m - ><! 

and E(b w ) = 2E (the number of waytmf select inanity twonf tin* ones included in 
2rt,y/ + 2tv/) 

- 2 K (mi -f- HI’ -h ft’i 

involves only the cross products since FAn" l ft ' For products of 

dependent pairs the expectation is p", while for independent pan> it is // Hence 
one merely needs to count the number of dependent and independent products 
Similarly for the third factorial moment one needs eonsder only products of 
tnree first powers of the variates (with expeHulion />'), tho.-e with two dependent 
and one independent variates (with expectation p 5 l. and flm-c \m(1i three de¬ 
pendent variates (with expectation ;/). 

Thus the second factorial moment cun he obtained bv counting the number 
of ways of obtaining two black-black joins from (i) three adjacent points and 
(ii) two pairs of adjacent points. They are explained below diagrumniutically 
for a 5 x 4 lattice. 




■ • • X • . 

‘X’ denotes a black point, 

denotes .any point other than black. The expectations for items (1), (2) 
and (3) indicated above are 

[(m - 2)n + {n - 2)m}p i l 
(2.1.1) 4(m ~l) ( n - 1 )p 3 , 

| [4mV - 4mn(m + n) + m 2 + n 1 - I2mn + 13(m + u) - 8] p\ 






PIIOIUMMTY DISTRIBUTIONS 


201 


respectively Thus 

(2 1 2) * M ' ! ’ ^ “ f 'bu •}• u) -f 4]/? 3 

+ H”' 3 '** ~ imn(m } n) + mV - 12 mn + L3(m + „) _ g] p 4 . 

It can now hr seen that 
(2 1.3) fj'i - (2ma — rn - a)/) 2 , 

(2.1.4) !ms ; (2mn ~ m ~ "V + 2 ( r,,,m ~Gm - tin + 4)p 3 

- (14 mn - 13m - 13?i + 8)p\ 

Putting m ■+• h — u, and mn = 6, the above expressions reduce to 

(2.1.5) m - (2h ~ a)p\ 


(2 1 ti) n 2 = (2// ~ a)p" -f- 2(0/; - Ga + 4)p 3 - (146 - 13a + 8)p\ 

TIicm' .substitutions have been continued throughout this Section. 

(b) Xun-frrv mmphntj The chances of obtaining r black points in free and 
non-frer sampling an* p and u[ n /b (r) re.speetively.Thcrefoie it is obvious that 
the rth factorial moment about zero for non-free sampling distribution of black- 
black joins can tic reduced by substituting ?t[ r) /6 (r) forp r inp{ r j for free sampling. 
This substitution gives 


(2.1.7) 


/ _ (26 - a) Tip' 

Pl(n,,n 3 ) --Cm--- , 


Mini ,nj) 


(26 - a)«p> 
6‘« 


+ 


6 (5) 

2(C6 


Ga -f- 4 )ti[ 


(3) 


6« 


( 2 . 1 . 8 ) 


{(146 - 13a + 8) - (26 - a)>P> 
6 «> 

f(26 - a)aP 2 
6 ( 2 ) 


where Pr(m.n t ) represents the rth moment with ?h black and tz 2 white points 
on the lattice. 

2.2. Uumulanls Jar the distribution of black-while joins for two colors. For m 
points on a line, the author [3] has shown that the first four cumulants of the 
free and non-free sampling distribution of black-white joins can be obtained 
from tlu; rum-froe distributions for (1, m — 1), (2, m — 2), (3, m — 3) 
and (4, m — 4) black and white points distributed at random. This method is ap¬ 
plicable for two and three dimensional lattices also. This can be established from 
the following considerations. 

(0 The rth moment about zero for the free sampling distribution is 

Z v <i~‘ Z , 


* This result differs slightly from that given by Moran. The correct result is the one 
given here. 



202 


P. V. KUISHN *l 1YKW 


where (i — mn and U/Jz is the rlli mumoni fm lIn* nmi-frer distribution with 
s black and (h — s') white points 

(li) Sr'/* >- s the same fur the twu distribute m- arming from fl! Muck and 
(6 — ft) white, points and (2! (h -■ si Muck and ;■ white point". 

(ni) The rth moment, is a polynomial in ;w/ uf denier r. This ran he seen from 
the fact that the fao.lonal moment is the sum of the expectations of the dtffeient, 
ways of obtaining r black-white joins. The ptoluMlifv uf r independent Murk- 
white joins is (2pr/) r and this is the highest, power uf /«/ 

In view' of the above conditions, (t) reduces In 


( 2 . 2 . 1 ) 


A'uvqiv + I?) 11 * " 1 + AjrPYfp + <l'l’ 1 • + -j- fj) h 


~ A'uJHj f A Ip'if • ■ ■ 1 - .IrrPVl 


where A[, , Atr etc. are determined from the following relations: - 
Sr(l,i>-1) = A\r, 

= dir + J -‘Ilf, 

(22,2) ^ & - i' j. ( h - A i A “ A 

Orta.b-,1) - Aar + l j j A 8 r + l 0 1 Air , 

[ S rC l,t-4) = A'ir •+- 1 ^ Air 4- 2 ^ Air + ~ ^ *1 (r , 


where S,u,i_i) is the rth moment about zero for the non-frro distribution with 
£ black and (b — t) white points. This is obvious by comparing the coefficients 
of pY' ( in (i) with (2.2.1). 

Therefore the first four cumulants cun be calculated by finding the frequency 
distributions of black-white joins for (1, it - 1), (2, b - 2), (3, b - 3) and 
(4, b — 4) black and white points. These distributions were determined by a 
systematic examination of the number of blaclt-w'hile. joins m all the possible 
arrangements for the given number of black and white points. The moments 
of these distributions enable us to determine the 4\s. 

The equations in (2.2.2) give 

An = 2(26 - a), 

An = 2(86 - 7a + 4), 

An = 2(326 - 37a + 30), 

Ait = 2(1286 - 175a + 220), 

A \2 = 4 Y “ 4a& + 46 s + 13a - 146 - 8), 

A f = 4(21^ - 66a6 + 486 s + 210a - 1506 - 228), 

A 33 = 8(-a + Q a % - 12a6 2 + 86 3 - 39a 2 + 120«b - 846= - 272a + 184b 
+ 312), 

A'u = 4(295a 2 - 760a6 + 4486 2 + 2305a - 1304b - 3428), 



I’KOHA HILITY D ISTRI liUTIONS 


203 


Au = S( —42« a -h 2Uia/i - 3G()a// + 192// - 1410a 2 + 3012a?; - 201G// 
- 78.34a + 301,3/; j- 12720), 

A' u = 10 (a 4 - Huh + 2 la ~lr - 32a// + 1(5/;' + 78a - 390a 2 /) + 018 ab 2 - 336 b 3 
+ ini3a 2 -- 4190a/) + 2252// + 7920ft - 3084/; - 13464), 


where a - m 4 - n, ami h --- mn. 

4’he above value-; of ,4 's give the first four moments for free sampling about 
zero. The cumulants reiiuee to the following expressions: 


(2.2.3) k, » 2(2/) - a)/; 7 , 

(2.2.1) Kn — 2(8/) — 7a + 4 )pq — 4(14/; — 13a + 8)//, 

(2.2.5) = 2(32/; — 37a + 3())pq — 8(90/; — Ilia + 114)?// 

4- 01(20/; - 37a + 39)///, 

(2 2.0) k 4 = 2(128/; - 175a + 220)pq - 4(17846 - 2017a 4- 3470)pV 
+ 82(1548/; - 2301a 4- 3228)?/;/ 

- 32(3120/) - 1899a 4* 0828)?//- 


As indicated for blnek-bluok joins, the first and the second moments for 
non-free sampling can lie calculated by substituting 


pV = nPnlVb™ 


in the uncorree.ted moments about the origin for free sampling. This is true for 
all the distributions considered in this paper 
Before proceeding to discuss the limiting form of the distribution, it may be 
noted that the first four cumulants for the free-sampling distribution of black- 
white joins are linear expressions in a and /;. This result is similar to what has 
been established for the linear lattice (Krishna Iyer, [3]). When the points 
lie on a line, all the cumulants of the distribution of the number of joins (black- 
black or black-white) are linear in m (the number of points on the line). This 
suggests that the higher order cumulants for the distribution of joins in a rec¬ 
tangular lattice also will be linear in a and h, i e. the rth cumulant will be of the 
form 


E (W 4- M r .a + N r ,)v"q‘, 

where L, M and N aie independent of a and b. It has not been possible to obtain 
a formal proof for this statement. 

The limiting form of the distribution of the number of black-white joins is 
now examined on the basis of the cumulants given above Since k->, , k s and are 
linear in a and b, and 72 tend to the limit zero as m and n tend to infinity 
That the higher order 7 ’s also tend to the limit zero can be seen from the fact 



204 


1>, V. KIUSIJN'A JYI.lt 


that all the cumulants will be linear functions m a and /». Hence the distribution 
of 

__ x — 2 {2b - a)pq 

lJ = V2(86” -~7a'+i)pq^ 4(14f» — l,W+"«)pV 

tends to the normal form as m and n tend to inlimtv, where x is the (ilisei ved 
number of black-white joins in a given uri alignment of (he points 
When p = 3 — 1, the first, second and third cumulants are equal to those 
obtained for a binomial distribution whose V is (2 h — a) 

As in the case of linear lattices, the distribution of the number uf blaek- 
white joins in an m X n rectangular lattice fur noil-free, sampling also will tend 
to the normal form as m and n tend to infinity, 


TABLE 1 


Distribution oj the number oj hladc-U'hitr joins for 2X3 lattice 


No of B-W 

joins 

0 

I 

2 

No. of black points 

3 4 

r. r. 

Total 

0 

1 

— 

— 

—• 

— 

l 

; 2 

1 

— 

— 

-. 

— 

— 

-- 


2 

— 

4 

2 

— 

2 

4 

| 12 

3 

— 

2 

4 

6 

4 

2 — i 

1 18 

4 

— 

— 

5 

8 

5 

- - 

! 18 

5 

— 

— 

4 

4 

4 

— — 

12 

6 

— 

— 

— 

— 

— 

■— - 


7 

— 

— 

— 

2 

— 

—• 

2 








(54 


Kl — 7/2, (ft — 7/4, «3 = 0, Kl = — , 

8 


In order to have an idea of the nature of the distribution of the number of 
black-white joins when p = q or otherwise, the complete distributions for the 
lattices 2 X 3, 2 X 4, 3 X 3 and 3 X 4 are given m Tables 1 , 2 , 3 , ami 4 
The distributions tabulated in Tables 1 , 2 , 3 and 4 show that tho probability 
of getting 1 and (2b - a - 1 ) black-white joins is zero, while, for 0 and 
(26 a) joins it is not so. But this abnormality will not affect tho limiting 
form of the distribution when m and n tend to infinity because, the probabil¬ 
ity for 0 and (2b - a) black-white joins also tends to zero, 

2.3 First and second moments for the distribution of Uack-whilc joins for k 
colors. Free sampling. Taking Pl and p 2 as the probabilities that a point in the 
lattice is black or white, the expected number of black-white joins is 


(2.3 1) 


2 (2b — a) pip 2 . 



PROBABILITY DISTRIBUTIONS 


205 


TABLE 2 

Distribution of the number of black-white joins for 2X4 lattice 


No. of 
B-W 



No. of black points 





joints 

0 

1 2 

3 

4 

5 

G 

7 

8 

Total 

0 

1 

— — 

— 

— 

— 

— 

— 

1 

2 

1 


— — 

— 

— 

— 

— 

— 

— 

- - 

2 

— 

4 2 

— 

2 

— 

2 

4 

— 

14 

3 

— 

4 4 

4 

— 

4 

4 

4 

— 

24 

4 

— 

S 

12 

8 

12 

8 

— 

— 

48 

5 

— 

— 12 

10 

24 

16 

12 

— 

— 

80 

6 

— 

— 2 

12 

20 

12 

2 

— 

— 

48 

7 

— 

— — 

8 

8 

8 

— 

— 

— 

24 

8 

— 

— — 

4 

G 

4 

— 

— 

— 

14 

9 

10 

— 

— — 

— 

2 

— 

: 

— 


2 



256 



- 

LO 

it 

ki = 5/2, 


o' 

11 

Ki 

_ 13 
4' 




TABLE 3 

Distribution of the number of black-white joins for 3X3 lattice 


No. of 
B-W 
joins 

No. of black points 

Total 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

0 

1 

1 

A 

— 

— 

— 

— 

— 

— 

A 

1 

2 

8 

32 

Z 

3 i 

— 

4 

8 

4 

— 

— 

4 

8 

4 

— 

4 

— 

1 

6 

4 

12 

12 

4 

6 

1 

— 

46 

5 

— 

— 

12 

24 

12 

12 

24 

12 

— 

— 

96 

0 

— 

— 

10 

26 

36 

36 

26 

10 

— 

— 

144 

7 

— 

— 

— 

12 

36 

36 

12 

— 

— 

— 

96 

8 

— 

— 

— 

10 

13 

13 

10 

— 

— 

— 

46 

9 

10 

11 

12 


— 

— 

4 

12 

4 

1 

12 

4 

1 

4 

— 

— 

— 

32 

8 

2 



512 


Kl = 6 , 


k 2 = 3, 


Ks = 0 


m = 4.5. 















206 


P. V KRISHNA IYER 


TABLE 4 


Distribution of the number of black-white joins for 4 X $ lattice. 


No of 
B-W - 
joins 





No. of black points 






Total 

0 

1 

2 

3 

4 

5 

0 

7 

8 


1(1 

11 

12 

0 

1 

1 

— 

— 

— 

— 

— 

- - 

- 

-- 



— 

1 

2 

2 

— 

4 

— 

— 

— 

— 

— 

— 

-- 

- 


4 

i 

* 

8 

3 

—— 

6 

8 

2 

•— 

— 

2 

— 

— 

2 

8 

0 

i 

34 

4 

—- 

2 

8 

8 

10 

4 

— 

4 

10 

8 

8 

2 

_ _ \ 

04 

5 

— 

— 

22 

28 

10 

18 

16 

18 

10 

28 

22 

-- 

— : 

172 

6 

— 

— 

22 

46 

56 

42 

30 

42 

56 

46 

22 

— 

i 

302 

7 

— 

— 

6 

52 

88 

88 

120 

88 

88 

52 

0 

— 

— ! 

588 

8 

— 

— 

— 

50 

119 

162 

156 

162 

119 

50 


— 

— , 

i 818 

9 

— 

— 

— 

28 

104 

184 

186 

184 

104 

28 

— 

— 

— 

1 818 

10 

— 

— 

— 

6 

58 

134 

192 

134 

58 

6 




1 688 

11 

— 

— 

— 

— 

32 

88 

122 

88 

32 


... 

— 


! 362 

12 

— 

— 

— 

— 

10 

46 

48 

40 

16 

- 


— 

■ 

172 

13 

— 

— 

— 

— 

2 

14 

32 

14 

2 

— 

.... 

- 

- 

j 

14 

— 

—■ 

— 

— 

— 

8 

18 

8 



—- 

- 

- 


15 

— 

— 

— 

— 

— 

4 

— 

4 

- 


— 

- 

- 

8 

16 

— 

— 

— 

— 

— 

— 

— 

— 

— 

-- 

- 

- 

.... 

1 

17 


— 


— 


— 

2 


— 

— 


._r 

— ™ 

j 2 


4090 


ki = 8.5, k 2 = 4.25, = 0, « = 0.875. 

TABLE 5 

Frequency distribution of the total number of joins between points of different 
colors for 1 black, 1 white and (mn — 2j red points 


No of joins 

Frequency 

4 

28 

5 

4 (5a - 26) 

6 

2(2a 2 - 25a + 46 + 56) 

7 

2(—4a 2 + 2 ab + 17a - 66 - 12) 

8 

4a 2 - 4al> + b 2 - 4a + 36 - 12) 


As m the case of black-black joins, the second factorial moment about zero 
is twice the sum of the expectations of the different ways of forming two black- 
white joins and can be determined by the method described m section 2.1. 



PROBABILITY DISTRIBUTIONS 


207 


(2.3.2) 


R[2l = 2(06 - fia -f 4)]hpi(pi + p 2 ) 

+ 4(a 2 - 4 ab + 46 2 + 13a - 146 - 8 )plpl . 


From this, m works out to bo 


(2 3 3) = ~ a ^ )l7 ' 2 + ~ ( ’ ,ft + 4 )PiP2(z>i + Ps) 

- 4(146 - 13a + 8)pip!. 

2.4. First and second moments for the distribution of the total number of joins 
between points of diffeienl colors for threw colors. The expectation for free sampling 
is 

(2,4.1) p[ ~ 2(2 b - a) 2p r p, . 

The coefficients of pq and 7 > 2 r/ in the second moment are the same as those for 
two colors. The coefficient of pflhpi can be obtained from the frequency distribu¬ 
tion of the* total number of joins between points of different colors when there 
are 1 black, 1 white, and ( mn — 2) red points in the lattice. See Table 5. 
Defining jS'ao.u-s) = - t‘f x for the above distribution, 

^o.i.fr-s) = 2(4a 2 - 30a6 + 32b 2 + 55a - 546 - 32). 

As in the ease of two colors, the second moment about zero for three colors 
reduces to the form 

A»i(pi + Pi + Pa) 6 2 2p r V> + Amipi + Pi + Pa) 1 " 3 P 1 P 2 P 3 + 

An(p + Pi + Pa) 6 " 4 2?4P 2 = Ai iSprP. + A n2 pip 2 pz + A 22 2 pip) , 


since pi + p 2 + pa = 1- 

The. coefficient of pip 2 p 2 on the left hand side of the above equation is equal to 
iSj(i.i,i>- 5 ) , i.c. (S' 2 (i,i,b_ 2 ) = sum of coefficients of j/C fhPz in A 21 (pi + p 2 + 
Pa) 6 " 2 2 p r p, and A m(pi + P 2 + pi) b " 3 piP2P3 • Therefore the coefficient of pip 2 p 3 in 
At 2 is iSaa.i.i—2) — coefficient of pX^^ipz in 2(86 — 7a + 4)(pi + p 2 + pf?~*2p T p, — 
coefficient of peptpz in 

4(26 - a) 2 (S p tVs f = S 2 a.n^ 2 ) - 2(86 - 7a + 4) (26 - 3) - 8(26 - a) 2 
= 4(17a - 196 - 10). 

It can now be seen that 

M 2 - 2(86 - 7 a + 4)2 p r p. - 4(146 - 13a + 8)2p 2 p 2 » 

(2,4,2) 

- 4(196 - 17a + 10)pip 2 p 3 . 

2.5 First and second moments for the distribution of the total number of joins 
between points of different colors for Ic colors. As in the previous cases, the expecta¬ 
tion for free sampling is 

(2.5 1) 


2(26 — a)Sp r p 5 . 



208 


P. V. KRISHNA IYER 


The coefficients of 2p r p s , 2p r p,p< and Spjp 2 . m the second moment are the 
same as those for three colors. The coefficient of 2>p r p,pt])u is determined by finding 
the distribution of joins between, points of different colors when there, are 1 
black, 1 white, 1 red and mn - 3 green points in the lattice. See Table (1. 

S2(l,l,;,nn-3) 

9 2(12a 2 6 - 69a6 2 + 72b 3 - 30a 2 + 330a6 - 3-12 lr ~ -lOSo + 3181/ -}- 240). 

The coefficient of 2 p T p>PiPu in pa can be obtained on the same lines as explained 
for three colors and is equal to iSsq.i.i.mn-a) ~ coefficient of pi *p,pt]>u in 
the homogeneous expression of degree mn in for three colors + 8(26 — a) 

= 8(141; - 13fl + 8) 


TABLE 6 

Frequency distribution of the total number of joins between points of different 
colors when there are 1 black , 1 while , 1 red and (mn — 3) green points 


No of 
joins 


6 

7 

8 
9 

10 

11 

12 


Frequency 


240 

12(19a - 112) 

12(6a 2 - 78a + 76 + 208) 

4(2a 3 - 57a 2 + 15a!i + 310a - G61i - 444) 

6(—4a 3 + 2a 2 6 + 36a 2 - 21ab + 2b 2 - 86a + 366 + 72) 

6 (4a 3 - 4a 2 b + ab 2 - 6a 2 + 8a6 - 26 2 - 10a - 40) 

(—8a 3 + 12a 2 6 - 6a6 2 + 6 3 - 24a 2 + 18a6 - 36 2 + 44a - 346 + 
192) 


It follows now that 

P 2 = 2(86 — 7a + 4)Ep r p, — 4(196 — 17a -+• 10)2p r p,p ( 

- 4(146 - 13a + 8)2p?p 2 + 8(146 - 13a + 8)2 p r p.p,pu . 

In general the cumulants 3 for free sampling involve b and a in the first degree 
only, and therefore, when m and n are large, the distribution tends to the normal 
form. If a; is the observed total number of joins between points of different 
colors, the distribution of 

x — 2(26 — a) 2 p r p, 

Vb 


3 The author has recently obtained the third and fourth cumulants for this distribution. 
They are linear functions of the dimensions of the lattice. The results will be published in 
an early issue of the Ind J Agnc. Stat. 



PBOBAI1ILITY DISTRIBUTIONS 


209 


tends to the normal form with 

10-7'rP* ~ 7ti2prp.pt - nDZplpl + 112Sp,p,p t p u , 

ns its variance for large values of m and n. 

For non-free sampling also, the distribution of 

,f — 2( 2 mn — m — n)£e,f, 

\/7>m ’ 

where <, - n,/mn, approaches the normal form having 

42>,f,r ( + 82^ — 162e f e,e t e u 

as its variance. The error of this variance will he about 5% or less when m 
and n are greater than 35. 

3. Three- and higher-dimensional lattices. This section deals with the first 
and the second moments for the distribution of black-black, black-white and 
the total number of joins between points of different colors for three- and higher- 
dimensional lattices. Resides these, the third and the fourth cumulants for the 
distribution of black-white joins m a three-dimensional lattice with points 
of two colors are also given. 

3.1 Firnl and second moments for the distribution of black-black joins. Free 
sampling. Let J5 3 (l) be the expectation of the number of black-black joins 
for a lattice of sides /, m and n. Further let A« and A* be the number of ways 
of obtaining a black-black join in m X n and l X m X n lattices. Then 

Fad) = Am . 

.4 3 = .44 + mn{l - 1), 

and 

A 2 = (2 mn — m — n). 

Therefore 

(3.1.1) J? 3 (l) = (3 Imn - Im ~ mn - nl ) pi . 

For the sake of convenience all the results for the three-dimensional lattice 
are expressed after making the following substitutions: 

c = l -)- m + n, 

d = Im mn + nl, 

e = Imn 

E 3 (l) in terms of c, d and e is 

(3e - d)pl. 



210 


P V. KIUSIINW IYKIl 


The expectation of the number of black-)thick joins for a lattice of r dimensions 
(Zi X h X ■ K) is given by 

(3 1.2) E r (l) = (rid-. --Ir-Zhh ■ Ux )pl, 

where Sljl 2 • • • fir-ii is the sum of the product of the sides falcon (r — 1) at a 
time. 

It has been pointed out before that the second factorial moment is twice 
the sum of the expectations of the different ways of forming two black-black 
joins Using this fact, if 2 D 2 , 2B ,\, etc., are the coefficients of /i 1 in the second 
factorial moment for two-, tlnee- and higher-dimensional lattices, it will he 
found by direct enumeration made m succession fiom lattices of lower dimen¬ 
sions that 


B, — B( r _\)l r -f- 4-4 (r_i) (K — 1) + l\l< • • hr-D f/, — 2). 

This can he established from the following considerations. 1) Two black-black 
joins can be obtained from three black points situated close to one another 
and the chance of having three black points m a specified manner is 2) The 
number of ways of getting two black-black joins from three points m llie lattice 
is 

B (r—l)lr + 4A {r -»(lr ~ 1 ) + Uh • ■ Ur ~ 2 ). 

C T , the coefficient of p i in the corrected second moment, is given by the equation 

Cr = -(2 B, + A r ). 

This follows from the fact that the sum of the coefficients of p :] and \ n the 
uncorrcctcd factorial momenL, about zero, is twice the number of ways of select¬ 
ing two joins from the total number of joins in the lattice which is ( A, — 1) 
Thus 


( 3 1 3 ) A T pl + 2 B r p{ + Crv\ 

is the corrected second moment for the distribution of black-black joins m a 
lattice of r dimensions. For an l X m X n lattice 


(3.1.4) M 2 = (3e - d)p i + 2(15e - lOd + 4c) pi - (33e - 2ltf + 8c) pi. 

3.2. Cumulants for the distribution oj black-white joins for two colors. The 
first four cumulants for free and non-ficc sampling distributions in an l X m X n 
lattice can be determined from the frequency distributions of bluck-white joins 
oi (1, Imn 1), (2 hnn - 2), (3, Imn - 3) and (4, Imn - 4) black and white 
points by the method described for linear 1 octangular lattices. If 

M,' = A" m + AlpY + • • + A" pY, 

the hist three distributions gl ve the coefficients oi pq, v y and pV m the first 
three moments about zero The three cumulants cateulated from "these momente 
aie given below m terms of c, d, and e for free sampling. 



(3.2.1) 
(3.2 2) 
(3.2.3) 


PROBABILITY DISTRIBUTIONS 


211 


ki = 2(3r — cl )pq, 

Hi = 2(1,Sr - lid + -1 r)pq - 4(33e - 21d + S c)p 2 q 2 , 

Kt = 20 08c - 9lr/ + GOe - 2l) V q 

+ 8(327r - 288rZ + 198c - 84)pV 
k 3212 Ulr - 197c/ + 138c. - 00 )pV- 

The ciilcul.Ttinn o) (lie fourth rmnulanl by the dnect method of finding the 
frequency tlistiibiif ion of Hicnuinber of black-white joins for 4 black and (lmn—4) 
white points was tumid to lie very laborious and therefore this has been cal¬ 
culated by a special method The coellicicnts of pq, p'q 1 and p^cf have been deter- 
innu'd, as in other cases, bv Imding 2 v*f x fm the first, three distributions These 
eoc'ffieients rediiei* to a linear hum in c, d and c Now the fourth cumulant, being 
a linear function of these (iiiantities, the coefficient of pq involves c, d and e 
in the fiist, degree only and therefore this can be assumed to be of the form 

(vr + |3 d T 7 c, + 5, 

wliere a, (1, y and o are constants No simple proof can be given here legarding 
the, linear assumption of the. eiunulants. It may be observed that this is true of 
tin* first, four eiunulants for linear and i eetangulai lattices. The author [3] has 
already provided a general proof of this assumption for the linear lattice and he 
hoiies to extend this for the higher dimensional lattices in the near future 4 

The const ants a, ji, 7 , and 6 can he determined by finding m for p = q = § 
from the frequency distributions of black-white joins for 2 X 2 X 2 , and 
2 X 3 X 3 lattices for two colors as given 111 Tables 7 and 8 . 

When p --- q -= 1, k 4 reduces to the form ale + b'd + e'e + d !, where a', b', 
c' and d' are, eoiistants. In view of this relation, if m and 11 . are lixed, and l takes 
i allies 1,2, 3, etc., the values ot k 4 for the different lattices should be m arithmetic 
progression. This can he seen by comparing the values of k* for the lattices 
1X2X2, 2X2X2 and 3 X 2 X 2 which are 1, 7 5 and 14, respectively. 
Using this properly, it is possible to find for a lattice of any size from the com¬ 
plete distribution of the lattices 1X2X2,1X2X3,1X3 X3, and 2X2 X2 
given before Thus hi for 2X2X2, 2 X 2 X 3, 3 X 3 X 2 and 3X3X3 
lattices aie 7.3, 14, 23.873 and 47 23 respectively. Now a, (3, 7 and 5 can be ob¬ 
tained by equating the gonentl expression for the fourth cumulant to the values 
given above for the corresponding values of l, m and « and putting p = g = |. 
The equations giving the values of a, f3 ,7 and 5 are 

( 80i + 1202 + Gft + 0 i = 7.5, 

I 120! 4 - 1002 + 70a + 0d = 14,0, 

(3.2 4) \ 

! 180i -f- 2102 “k 803 + 04 — 25.8/5, 

[ 2/0i -f- 2702 ~k 903 -k 04 = 472o, 

4 This proof has been obtained recently and will be published soon 



212 


r. V KUIM1X \ IYKU 


, n 32 X 19170 + « „ --32 X 21638 + ,! 

where Bi - -;- . >h - 


0 3 = 


256 

32 X 20952 + 7 
256 


, and Hi 


256 

— 32 X 16128 -f a 
256 ‘ 


They give 


a = -32 X 10143, 0 =- 32 X 21615, 

7 = -32 X 20940, and S - 32 X 16128. 


TABLE 7 


Frequency distribution of black-while joins, 2X2X2 lattice for two colors 


No. of 
black- 




No. of black points 




Total 

white 

joins 

0 

1 

2 

3 4 

5 

G 

7 

8 

0 

1 

1 

— 

— 

— — 

— 

— 

— 

1 

Q 

tU 

2 

3 

_ 

8 

-- 

I I 

I 

- - 

8 


16 

4 

— 

— 

12 

— 6 

— 

12 

— 

— 

30 

5 

— 

— 

— 

24 — 

24 

— 

— 

*- 1 

48 

6 

— 

— 

16 

— 32 

— 

10 


-- 

04 

7 

— 

— 

— 

24 — 

24 

-- 


— 

48 

8 I 

— 

— 

— 


— 

■— 

— 

— 

30 

9 

— 

-- 

—• 

8 — 

8 

— 

— 

• — 

1(5 

10 

-- 

— 

— 

— — 

— 

— 

— 


-* 

11 

— 

— 

— 

— — 

— 

— 

— 

— 

—— 

12 

— 

— 

-- 

— .— 

— 

— 

— 

— 

2 



256 



(Cl = 

6, 

*2 = 3, Ki 

= o, 

Ki = 

7.5 




Thus the general formula for the fourth cunxulant ig 


Ki — 2(648e — 671d + 604c — 432) pq 

-4(999Ge - 10857d + 10190c - 7632;)p*g* 
+32(91448 - 10167d + 9732c - 7416)pY 
-32(19143e — 21615<f + 20940c — lG128)j?V • 


fnr tl a 1 la , ttl ° e , of Zl > ^ > -Ir'mr dimensions, the first two mo 
for the distribution of black-white joins for free sampling are as follows: 

/■q o c\ / 


(3.2.6) 

(3.2.7) 


/ 

Ml 

m 


2A r pq, 

2(A r + B r )pq + 4 CVp ! g\ 






PROBABILITY DISTRIBUTIONS 


213 


Like (lie distributions for linear and rectangular lattices, when l , in and n 
tend to infinity, 71 and y» will tend to zero and therefore the distribution of 
black-white joins for an l X m X n lattice also tends to the normal form. The 
remarks made in connection with the distribution of black-white joins for a 
rectangular lattice are true here also Hero the frequencies for 1 , 2, [(3c — d) 
— 2] and [(3c — d) — 1] black-white joins are zero, while for 0 and (3e — d) 

TABLE 8 


Frequency distribution of black-white joins for 3X3X3 lattice for two colors 


No. of 
black- 






No. of black points 






Total 

white 
join a 

0 

1 

2 

S 

■1 

5 

e 

7 

8 

9 

10 

11 

12 


0 

1 

2 

3 

4 

1 

8 

4 

8 


2 

— 

— 

— 

2 

— 

8 

8 

4 

1 

2 

16 

28 

5 

— 

— 

8 

8 

— 

— 

— 

— 

— 

8 

8 

— 

— 

32 

6 

— 

— 

24 

20 

8 

8 

12 

8 

8 

20 

24 

— 

— 

132 

7 

— 

— 

24 

48 

40 

40 

16 

40 

40 

48 

24 

— 

— 

320 

8 

— 

— 

2 

52 

81 

56 

68 

56 

81 

52 

2 

— 

— 

450 

9 

— 

— 

— 

40 

104 

112 

144 

112 

104 

40 

— 

— 

— 

656 

10 

— 

— 

— 

44 

100 

188 

160 

188 

100 

44 

— 

— 

— 

824 

11 

— 

— 

— 

8 

88 

144 

176 

144 

88 

8 

— 

— 

— 

656 

12 

— 

— 

— 

— 

36 

108 

162 

108 

36 

— 

— 

— 

— 

450 

13 

— 

— 

— 

— 

24 

88 

96 

88 

24 

— 

— 

— 

— 

320 

14 

— 

— - 

— 

— 

12 

28 

52 

28 

12 

— 

— 

— 

— 

132 

15 

— 

— 

— 

— 

— 

8 

16 

8 

— 

— 

— 

— 

— 

32 

16 

17 

18 

— 

— 

— 

— 

— 

4 

8 

20 

4 

8 

— 

— 

— 

— 

— 

28 

16 

— 

— 

-— 

— 

.- 

— 

— 

— 

— 

— 

— 

19 

20 

— 

— 

— 

— 

— 

— 

2 

— 

— 

— 

— 

— 

— 

2 



4096 




Kl 

= 10 , 


Kj — 5 

> 

Ki = 0 

) 

m = 

14 





they arc two But this irregularity will not affect the limiting form of the dis¬ 
tribution since the relative frequencies tend to zero. 

3 3. First and second moments for the distribution of black-white joins for k 
colors m an r-dimensional lattice. The results for free sampling follow easily from 
a consideration of the expectations of the various ways of obtaining one and 
two black-white joins. The expectation of the number of black-white joins is 

(3 3.1) 2 ArpiPi. 






214 


l>. V. K MIMIN' \ nun 


The expectation for two black-white joins h 

p l , \ , . -lft.tr • 1 ) J, }:• 3 

BrPiMPi -1- }>!> -1 1 j „ " Itr ' P 1 lh- 

Trora this it will follow that the second moment 

(3.3,2) ms ” 2A r jhjh ■(* '-'/-V/up/;), 1 /id (■ 4f' r /up? 

3 4. Fiist and srrond moments for tin- distribution of the infill numhrr of joins 
between points of different colors for an l X in X n tailin' for three rotors. The ex¬ 
pectation for free sampling is 

(3.4 1) 2(3r - 

TABLE 0 


Distribution of joins between points of differntt rotors for 1 hind;, 1 while and 

(linn — 2 ) red points 


No, of joins 


frequency for lattices 



2 X 2 X 2 

2 X 2 X ;i 

2 x 3 X 3 

3 X 3 X 3 

5 

24 

Ui 

8 


6 

32 

50 

80 

104 

7 

... 

50 

HU 

144 

8 


4 

00 

270 

9 

— 

... 

18 

112 

10 

— 

— 


00 

Total ... 

50 

132 

300 

702 

Sx 2 / X about zero . . 

1752 

5410 

15778 

41130 


The second moment will involve terms in 2p r p ,, jnprp^ and Npjp® ■ The, co¬ 
efficients of 2 p T p, and ’Zplp'i are the same as those, foi two colors. The coefficient 
of pip 2 p 3 can be determined by finding the frequency distribution of pirns be¬ 
tween points of different colors when the lattice consists of 1 black, 1 white, and 
(Iran — 2 ) red points. But this straightforward method is cumbersome, and 
hence the coefficient of P 1 P 2 P 3 has been determined by finding flic distribution foi 
the special lattices 2 X 2 X 2, 2 X 2 X 3, 2 X 3 X 3, and 3 X 3 X 3. These, 
results are shown in Table 9 . 

The coefficients of pip 2 ps in. the corrected second moment for the above lattices 
are obtained by subtracting 2(18e - lid + 4 c) ( 2 c - 3 ) + 8 ( 3 e - df from the 
moments noted above. This can be seen to be so by comparing the above ex¬ 
pression with the quantity subtracted from the uncorrected second moment for 
a two dimensional lattice in section 2 4 The coefficients so obtained for 2 X 2 X 2, 



PROBABILITY DISTRIBUTION'S 


215 


2 X 2 X 3, 2 X 3 X 3, and 3X3X3 lattices ate -336, -640, -1184 and 
—2142 respectively. Now the coefficient of V\p$h > n the corrected second moment 
is of the form 

etc + fi'd + y'c + 5'. 

The equations obtained by equating this expression to —336, —640, —1184 
and —2142 for the respective lattices give a' = —174, /3' = 108, 7 ' = —40 


TABLE 10 

Distribution of joins between points of different colors when there are 1 black, 
1 white, 1 red and ( lmn-3 ) green points 


No of joins 


Frequency for lattices 


2 X2X2 

2X2X3 

2X3X3 

3X3X3 

7 

144 

48 

— 

— 

8 

144 

312 

288 

72 

9 

48 

480 

912 

1344 

10 

— 

432 

1344 

2664 

11 

— 

48 

1560 

4392 

12 

— 

-- 

720 

4584 

13 

— 

— 

72 

3168 

14 

— 

— 

— 

1206 

15 

— 

— 

— 

120 

Total. 

336 

1320 

4896 

17550 

2x 2 J x about zero .. 

20160 

110208 

531312 

2370168 


and 5 ' = 0. Thus the second moment for a lattice with points in three colors is 

2(18e - lid -f 4 c)Zp r p, 

(3.4.2) —2(87e - 54d + 2Qc)pip i pi 

—4(33e - 21 d + 8 c)SpJp!. 

3.5. First and second moments for the distribution of the total number of joins 
between points of different colors in an l X rn X n lattice, for four or more colors. 
The expectations for free sampling are given by the same expression as foi thiee 
colors. The coefficients of F,p r p, , Hp T V,Vt and 2p r p« in the corrected second 
moment are also the same as in section 3.4. The coefficient of Zp r p>ptpu can be de¬ 
termined by the method described in section 3.4 for Hpipipi from the frequency 
distributions of joins (Table 9) between points of different colors for 2 X 2 X 2, 
2X2X3,2X3X3 and 3X3X3 lattices when they consist of 1 black, 1 
white, 1 red and (e — 3) green points. 




I’. \ kl!IM|V\ lU.lt 


21C, 

Tin* coefficient of ~p,p,p,p„ in flu* enneetc.1 -evuitd moment i- obtained by .sub¬ 
tracting (obtained in the «ume way as for the two ilimensional lattice in section 
2.5) 

C,(lSe - lb/ 1- b lb - 21 s 

b(3e - 8)l2(~87e b 5 id ~ 20r) | 8f3r - df] 

-8(3c - f 1? 

from the uncorre.cteil values. The values so obtained for the four lattices are 
480(2X 2 X 2), 928(2 X 2 X 3), 1730(2 X 3 X 3) and 3108(3 X 3 X 3). The 
coefficient of p r p,p,p *, as in other cases, being of the form 

a"r + 0"tl -\- y"c b 

a", /3", y" and 6" can be. determined by equating the above expression to 480, 
928, 1730 and 3108 for the respective lattices. The coefficient so obtained is 

S(33r - 2id + He l. 

Hence the second moment for free sampling when the lattice contains points of 
four or mine colors is 

2 (l8c - lid + ic)2prv. 

3 . -2(87r - 54d + 20el2/> r /,./,, 

- l(33c ~ 2Id + WZplp] 

+8(33c — 21d + &c)Zprp t p,p u . 

In general, it will be. found that the cumulants involve terms in c, d, c and an 
absolute term only Therefore when l, m and n tend to infinity and ihdhdh ■ 
are finite, the distribution of R - 2(3e - d)2p r p ,, where R is the, total number 
of joins of points of different colors, tends to the normal form. When /, m and n 
are large, 

R - 2(3e - rf)S7> ,P. 

V e. 

can be considered to be normally distributed with 

(3 5.2) 362p r p, - 1742p r p*p( - 1322p 2 r p 2 + 2(1 ^p r p,p t p u 

as its variance. 

The distribution for non-free sampling here also tends to the normal form for 
the same reasons given for the rectangular lattice. As in free sampling, for large 
values of f, m and n 

R — 2(3e - d)2c r c, 

vV 

is distributed normally with the variance 

(3 ‘ 5 4 ) 02f + 12Sc 2 e 2 - 242fi r e s CiC u , 



I'llOl) UUI.ITY DISTRIBUTIONS 


217 


where R is the observed number of joins for a given distribution of the points 

7/ r 

and c, — I’lie error in this variance will he. about 5% or less when l, m and 
a are greater tlum 3fi. 

We may e.nnelutle Hus scot ion hy giving the. first and the, second moments for 
free sampling with I: colors for an r-dimen.sional lattice, 

(3.f>.f>) Mi 2.‘l r 2;i r p„ 

(3.5.1)') n? - 2(.l, -1- B r )Zp r p, 

1 2(3/f r + AC T )Zp r p,pt 

•1 If’r-Pr/b - SC r Zp r p„p,p u , 

where .4,, B r and (\ are as defined in section 3.1 
This can lie seen from the following facts: 

(1) The coefficients of Zp r p, and Zplp] are the same as for two colors 

(2) The coefficient of Zp r p.ji t is the number of wayB of getting two joins of 
different colors from combination of points not included in Zp r p,p,p u . This can 
be had from three points of three different oolors close together and four points 
of three different colors separated into groups of two each such that each group 
will give one, join. The number of arrangements of the first kind is 3 \B r . For 
the, second kind it is 8(/l“ T C r ). Subtracting from the total number, the con¬ 
tribution of Zp T p,jh in the correction factor \Al(Zp r pt) 2 , the coefficient of Zp r p,p t 
in the, second moment works out to be 

2(W r + 4 C r ). 

(3) The, coefficient of Zp r p,p,p u , as in all other eases dealt before, is twice that 
of Zplp] with an opposite, sign. 

Acknowledgements. The author’s thanks are due to Dr. D. J. Finney for 
suggesting (his problem and for all the facilities and help given to him in carry¬ 
ing out the investigations discussed in this paper. The author is also grateful to 
Mr. P A P. Moran for explaining the results of his investigations on this problem 
and for the interest, taken by him m the course of the research. 

IlKFKJtEN CES 

[11 V. UonTKliswicz, Die Iteration, Iierhn, 1917. 

121 R, G. Burk, “The patch number pi obtain,” Sci Cult , Vol. 12 (1946), p, 199, 

[3] P. V K it r min a IvKit, “The theory of probability distribution of points on a line,” 

Indian J.Agnc. Slat., Vol. 1 (1948). 

[4] II. Luvene, "A test of randomness in two dimensions,” Ann. Math. Slat., Vol. 17 (1046), 

p. 500 

[5] P A Moran, “Random associations on a lattice,” Prec. Cam. Phil.Soc ,Vol 43 (1947), 

p. 321. 

[6] P A. Moran, “The interpretation of statistical maps,” J. Roy Stal, Soc , B, Vol 10 

(1048), p 243 



MINIMAX ESTIMATES OF THE MEAN OF A NORMAL DISTRIBUTION 
WITH KNOWN VARIANCE 

By J. Woi.vn wm 1 
Columbia {'nurmtif 

Summary. It in proved that the el apical cslimafum procedure-? for the mean 
of a normal distribution with known variance are minimax solutions of properly 
formulated problems. A result of Stein and Wald[ 11 is an immedial o nm.wqupnfp. 
Other such optimum properties follow. Sequential and non sequential problems 
can be treated in this manner. Interval and point estimation are discussed. 


1. Sequential estimation by an interval of given length l. In this section we 
shall consider the problem of sequentially estimating the mean of a normal dis¬ 
tribution with known variance by an interval of fixed length l. Without loss of 
generality wc shall take the known variance to he unity. Such a sequential estima¬ 
tion procedure, which we shall designate generirally by <7, is a rule which says a) 
when to terminate taking random, indc.pendent observations on the normal 
chance variable with unknown mean £(—»<£<«) and variance 1, and 
when this termination is to occur after the observations Xi, • • ■ , x„ have been 
obtained, gives b) the center of the estimating interval of length l as a function 
of xi , • ■ , x n Let a(£, G) be the probability under (} that the, estimating interval 
will contain £, and let ?i(£, (?) be the expected number of observations when £ is 
the mean and G is the estimation procedure (It is assumed that G is such 
that <*(£, G) and n(£, (?) exist for all £), 

Define 

?(£i G) = l — a(£, (7), 

and for fixed c > 0 


(Id) W(£, (?) = 3 ({, (?) + cn(f, 0). 


Let C ( N , I) (I > 0, N a positive integer) be the classical non-sequential estima¬ 
tion procedure where one takes the fixed number N of observations, and estimates 


the mean by the interval 



where x is the sample mean. For p 


such that 0 < p < 1, let C (p, V, l ) be the following estimation procedure: A 
chance experiment with two outcomes, N and N + 1, of respective probabilities 
p and 1 — p, is performed. One then proceeds according to C{i, l), where i( N , 
A + 1) is the outcome of the experiment. Finally define 


M(y) = 


V2: 


1_ c* 

/r 2rt J u 




dz , 


1 Research under a contract with the Office of Naval Research 

218 



MINIMAX ESTIMATES OF MEAN 


219 


Let uk assume for a moment that the, unknown £ is itself a chance variable, 
normally distributed with mean zero and variance a, and let us obtain a pro¬ 
cedure (f which minimizes 


(1 2) /£{</(£, (t ) 1- r n(£, (/)] 


--L- r 

V2t r<r J => 


l ( l(y> ft) 4- cn(y, GO] exp 



dy 


Let .ri , • • ■ , i„ he m indejiendent, observations on a normal chance variable 
with mean i and variance 1. Tad 


X 


Zx, 

1 

m 


The, a posteriori distribution of £, given x a , • 
[1], etjs (ID) and (20)) to he normal with mean 


( 13 ) 

and variance 

( 1 . 1 ) 



x,„, is easily verified (or see 


Thus if wo slop after m ohservations the best procedure from the point of view 
of minimizing (1,2) i.s to put. the center of the estimating interval of length l at 
the point (1,3). Tin* conditional expected value of q(lj) is them 

(L.">) Q(x i, • • , x m | cr i ) = 2A/Q m + . 

Thus Q(x i , ■ • , .(•„,) is a function only of m and <A Define 

(1 (i) Mm, a 2 ) 2,1/Q ^m + j) - 2A/Q j/m + 1 + j) • 

We note tlnil, R(m, &') is, for fixed a, a decreasing function of m We conclude 
that a host, decision as to whether or not to take another observation must be 
based on the value of Ii{m, a 2 ). If Il(m, a 2 ) > c take another observation, if 
Rijn, a~) < c do not take, another observation, if R(m, <r~) — c take either action 
at pleasure Hence, if r is such that Ii{N, x) < c < R(N — 1, cr 2 ), a best pro¬ 
cedure from the point of view of minimizing (1,2) is to take exactly N obser¬ 
vations. This integer N is a function of c and <r 2 , thus: N(c, a 2 ). In the next 
paragraph we shall show that N(c, a 2 ) can be defined lor every positive c and a. 
H is clearly a function which takes at most two values. Wo shall denote by G(a~) 
the estimation procedure described above which minimizes (1.2). It consists of 
taking the fixed number N(c, a 2 ) of observations and putting the center of the 
estimating interval of length l at the point (1 3). Where N(c, <r 2 ) is double-valued 
we may take either value at pleasure. We verify that the value of (1.2) is the 
same for either choice. 



We now verify that A T fr, <r 3 j ran 1m- dHiied fm ;il! p»-i>r. e c and <?. Wo have 
remarked earlier Hint Him, rr 1 ) irt, fur iix<*>f , r \ a monot .oucally deoroadtig func¬ 
tion of m. We note that 

lint H'in, tr4 0, 


When c > R{(), r*) we take no nhservalioui whatever and tak«‘ ,r 0. When 
c~ Ii( 0, <j s ) we take zero nr one nhstTvai uni nl plea-mi*. 

Without difficulty we compute 


W(f, G(*')) = ir(f, tr") - c.N 4- \ 


1 1 


f 


1 

X„ 

i 


M (V.Y ' 


L 



1 “1 




where for typographical simplicitywehave written X for Mr, o’). For fixed c and 
a the minimum of W(t, </) occurs at f - t). AI mi H'(th <r'*J m a mouotimieally 
increasing function of <r *. If N{c, «) > {) (hmi, as o' t. it approaches the limit 


cN{c ,«) 4- 2/V^ \/N{r,, 
which is the constant value of 




w(e, c(N(c, <»), o). 

We therefore conclude that C{N{c, «), I) jg a minimax estimating procedure of 
type G, i.e., 

W(£, C{N(c,, to), l)) ~ inf sup W(f, 0) 
o ( 

for any c > 0. (The case N(e, «>) — 0 may he verified separately. We define 
S = 0 for C(0, l )). 

Conversely, let No be a given non-negative integer. Then G(Wo, 0 is a minimax 
estimating procedure G for all W(|, G) for which c satisfies 


B(Wo, «) < c < 7?(Wo - 1, to). 

(We define R(-l, <*) = co,) Thus wc can say: For every c > 0 there exists a 
classical estimation procedure C(N, l) with integral N such that 


W(e,C(N,Z)) 


inf sup W(£, G). 
o f 


1 oi every integral N we can find at least one c > 0 suedi that the above equation 

°i 0l , M “ e N > glven c ’ and 0[ finllin K e, given N, has been de- 
■ i e a ove. (We have taken the liberty of calling C(0, l) a classical procedure, 
net ao be a given number such that 



MIMMW J^TIM \TKK OF MKAN’ 


221 


TlpHno pt, , 0 < jb < 1, and a positive integral .¥ 0 uniquely by 

<lfn •- P" ( l - f)) i- a - Pn ) (i - 2\f(vN7+l . 

Let 


Co /f(V 0 , «). 

For o ■» Co we verify readily that both (?(¥„ , 1) and C(V 0 + l, l) are minimax 
estimating procedures (i, so that 

WO;, am , l )) - !F(S, f(.V 0 -f- 1, l)) 

- po Tree. g(v 6 , i)) + (i ~ tF(f, c(¥ 0 +1, i )) 

“ r (f — «o) + Ai(po -Vo -(- (1 — po)(Vo + 1)] 

«■ (1 - o») + Co[Vo f (1 - po)J. 

Therefore, for any <7 whatever, 

(1 - a a ) -f Co[V o + (1 ~ po)] < Slip {<?(£, (f) + Co »(f, G)) 


< sup tf(f, (?) + co sup n(f, G). 

Hence 


sup ij(f, G) < 1 - «0 
£ 

implies 

.sup «(£, GJ > Vo -f (1 — p 0 ), 
£ 


a result, first, proved by Stein and Wald [1]. 
Also 


sup «(£, G) < Vo + (1 — Po) 
f 

implies 

sup rj(£, G) > 1 - ao, 

t 


a result also proved in [lj. 

2. A sequential upper bound for the mean. The fact that in the last section l 
was a constant made matters simpler, as we sec when we begin to consider the 
pioblcm of a sequential upper bound for £(— «> < £ < °o). This of course means 
that we wish to use as estimating interval tlic interval (— °°, L (a*, • • , %„)) 
where L is a function of the observations xi, • • • , x n , and n (a chance variable) 
is the number of observations befoie the process of taking observations is termi¬ 
nated. What is wanted now is a suitable definition of the “length” of this in- 



222 


,i. \vou>muu 


lerval. Also we .shall admit l hr possibility that it lit! income sen,-(> advan¬ 

tageous to have intervals of varying length; tlii- poser. the pinhlem of oplmmm 
choice of the function L(r i, • • • t /,l. 

As before, let £ be the mean of a normal distribution with unit vumnee. Let 
T be the generic estimation procedure which eoiisLts of a rule for terminating the 
taking of observations, and of a function L t (si , ■■ . A) winch Limed to esti¬ 
mate £ by the interval (— «=, L r ). Udine 

qU,T) = V[L t < i\, 

X(€, f) - K {1 r - £) 2 , 

and 

(2.1) W(£, T) « fl (f, T) + &Xtt, 7 ') -1- mfc T), 

wheie c and k are positive constants. (We admit only such T for which the quan¬ 
tities q, and n are defined for all real £,) As before, let us temporarily assume 
that £ is normally distributed with mean zeio and variance </, and set ourselves 
the task of minimizing 

( 2 . 2 ) W(y, T) e dy - W*(T t a 3 ) 

with respect to T In the next paragraph we digress for a moment to derive a 
needed elementary inequality. 

Let us prove that, if h, hi , and hi are non-uegative, and 

( 2 . 3 ) h l = V M + (1 - V) K , 
where 0 < p < 1, then 

(2 4 ) M(h) < p M(h) + (1 - p) M (/it). 

Hold h and p fixed. The desired result is obviously true when hi = ht = /i. Let 
bi and h vary, subject to ( 2 . 3 ). Then 

din _ — pin 

d/ii (1 — p)/i2 ’ 

Also 

pdMQn) -p _ [h t 
d/ii = V2 t C 
and 


, dMQh) _ dM(hi) dhi phi 

d/fi ^ d/i 2 d/u "\/27 t/i 6 


Thus the derivative of the right member of ( 2 . 4 ) with respect to hi is 0 when 
/ii = h, positive when h x > h, and negative when hi < h. From this we get (2.4). 



MINIMAX ESTIMATES OF MEAN 


223 


Lot T Lo any estimation procedure and L r (x 1 , ■ • • , x n ) its associated func¬ 
tion. Write 


I r(j'l , 1 • ■ , -fn) 


r i t 1 

L T (.ri, • ■ • , ,r n ) — x 1 + —- 

no _ 


If n -- m and .r, , ■ • • , ,r„, is tho sample obtained, wc have that the conditional 
expended value of 1L*(7\ a") is 

(2.5) Alflrfai ,'••,£«.) /j/»t + ^ + cm + kE(Ut + lr(x i, ■ • • ,x m )) 2 , 

where (7* is a normally distributed chance variable with mean zero and variance 
(in + The last term m (2.5) is therefore 


k (in + + l\ (xi, • • • , -Cm) 


This is an even function of l T , while the first term of (2.5) is a monotonically de¬ 
creasing fund ion of l T . Thus (2.5) and hence \V*(T, a 2 ) will be minimized by 
taking l T non-negative. Now take the expected value of (2.5) over the set of 
samples where u -- in. Application of the result of the preceding paragraph to 
the Unite sums which approximate the integral gives the result that W*(T, o 2 ) is 
minimized when l r (x t , • • , x m ) is a function only of m. Hence we may restrict 
ourselves to consideration of procedures T for which (2.5) takes the value 

(2.6) fti(^/\'m + \ IrO/o'j + cm + k (in + + [lr(m)} 2 . 

For any such procedure, T, since k and c are fixed positive numbers (and a is 
held fixed for (he present), the expression (2.6) takes its minimum for some 
value of m. Thus, m our quest for a procedure T which will minimize W*(T, a 2 ) 
we may restrict ourselves to procedures of fixed sample size. This fixed sample 
size and the (constant) value of l T are functions of le, c, and a . For fixed m, 

m + + k{?) 2 

has an absolute minimum at l m , say, since it is a continuous function of l (l > 0) 
which approaches <& with t\ The ease m — 0 must be considered. (In this event 
x 0.) Now consider the sequence 

| A!(^m 4- ^ 4- cm + k (m + -~j 4- & j 

for m = 0, 1, 2, - • • ad inf. This sequence condenses only at °o. Hence there 
exists a value N (fc, c, a 2 ) of m for which the elements of this sequence have a 
minimum value We may choose NQc, c, a 2 ) so that lim„ 2 =«, N(lc, c, a ) exists. 
(We verify easily that this is always possible.) Designate this limit by N(k, c, °°), 



224 


j, wdUnwir r. 


and the associated l by l(k, r, x j. The l v.ifh SO;, r, er 1 ) will he desig- 

nated by l(fc, c, v 1 ). Thus a best procedure for minimizing W*(T, <S) is to take 
the fixed number N(k, c, <r l ) observations*, and tnu-e, n> upt**r bound for f, the 
quantity 

We see readily that 

Kfc, r, x) • lint /ffr, r. <P) 

and that 

M(\/N(k 1 c~'x<) l(k, 0, •»)) “ Ihn .U /j/Abt, r, crt [- \ Kk, r, <r J )^ , 


1 + 


1 

cr % N(k, r, <r : i 


i hi, r, n ). 


Let T (cr 5 ) be the procedure described above which is a best procedure T in the 
sense of minimizing W*(T, (r J ) when a in the variance of (. 

We now compute W(£, T(«r*)) and obtain 

f Na* / £ V" 

TF(£, M) = r,N + k [ r+W + [t ~ , ,; AVS ) _ 


(2.7) 


/tj-AV 

+ * f \ Vss 


_ £ 

1 + Nff 1 


where for brevity we have written N and l for N(k, r, a) and l(k, a, cr 1 ). I/'t 
£ 1 + Nc- 

\/Na l 


l 


1 + N<r 


> K, 


Vn + 


Then 

( 28 ) 

(2.9) 


W — cN + k 


[(VW' 


~ = 2 kx - 
dx 


+ «)' 

{Vn + e ) 

V^7T 


+ X 1 


+ M([\AV + «] *), 


exp [ -H(VAf + «) 2 x J |]. 


The second term above is always of the same sign and the exponential decreases 
as | a; | increases. Thus dW/dx = 0 has the unique positive root x*. Put x* for 
x m W (in 2.8) and call the result W*. W is a continuous function of x and ap¬ 
proaches a> as | x | —► oo Since the root x* is unique it follows that W* is the 
minimum value of W with respect to x. Now N(k, c, cr 1 ) is constant for a sufE- 
ciently large. Hence, for such cr 1 , we have 

T - (W + 2fa * f - 17 l- i I (VW + .)•,*’ 11 


^g«*p[-4((VS + 0V ! H 


-2k 


{VN + V~ V2^ exp ^H(VAr + e) 2 x* 2 }l 


( 2 , 10 ) 



MINIMAX ESTIMATES OF MEAN 


225 


since x* is the root of dW/rtz — 0. Also e is positive and, for tr 2 sufficiently large, 
approaches zero monotonirally as <r* approaches = 0 . For « > 0 we have that 
dW*/df < 0, since x* > 0. We conclude: For tr 2 sufficiently large, 

min Wft, T^)) 

increases inonotonically with it 1 and approaches 

cN + k + |* A .(fc)} s ] + M(VN'x H (k)), 

where N is short for N(k, e, «>) and x N {k) is the unique positive root of the equa¬ 
tion in x 

2kx = ™=L exp [—-^Ahc 2 ] 

V2 r 


Going back to the definition of l(k, c, co) w r e see that the latter satisfies the equa¬ 
tion in l: 

^ {M(VNl) + kC-\ = 0. 

Hence 


x N (k) = /(/s, c, ») 


Thus the classical estimation procedure Co where one takes the fixed number 
N(k, c, 00) of observations and uses as upper bound for the mean x + l{k, c, <*>) 
is a minimax procedure T, i.e., 

W'ft, Co) = inf sup TF(£, T) 

T £ 


For hxed N, x N {k) decreases monotonically from + 00 to 0 as k increases from 
0 to + 00 . Hence, for given positive integral No and l* > 0, there is a unique 
positive value ko such that Xw f (Aro) = l*. Consider the expression 


( 2 . 11 ) 


B(m) = M(s/m x m (ka )) + cm + h 



where m is a positive, continuous variable. We have 

dB{m) kt dx m (h) d 

, dm ltd 

( 2 . 12 ) 


dm 


dm dx m (ka) 


M{\/m XmQCa)) + &0 [x m (/Co)] 


+ 


5ilf(Vff l Xm{h)) 


dm, 


The third term of the right member is identically zero because 

(2.13) 2kox m (h) =^Lexp {-§m[x m (fco)] z }. 

\/2ir 



. 1 . nol.lown/ 


Further we have 


rFBuni 2 / , >! m Kr~," : 

-- i - , — * , r 1 ^ 

dm' m dm 2%, 2?r 


2?-, h.d\m r„.f7-.i r j 
w* dm 

For typographic 1 simplicity we shall tin* y for /,/</", >i in flic* coiuputsifioiis of tin* 
next few linen. From (2.12) we obtain 

log 2h 4- log >1 •* -log \' r 2?r i ' log m - J m y 3 , 

1 d»/ I __ if «/y 

ij dm 2m 2 ' n ^ c/m " 


rfc/ _ y(l ~ my'i 
dm 2m(l i- my) ’ 


Hence 


c/ViCm) 


mm/ . .a , -a a -i du 

-j—o " ““ 2k[))H *4“ ■— it, t 7/7 // • 

dm- dm 


— 2/fo m 5 -j- A-„ m * jf‘ — 


ha >f (l — m;f\ 
we'l p mif-} 


Since c > 0, we have 


= 2fc„«f* + -yi v > i> 
m(I + my) 


lim B(m) = lim B(m) - +«. 

tthwO tHwM 


Hence there exists a value of m for which /f(m) lakes it s minimum value. If in 
d B {m)/dm we put m = Ao and set the resulting expression equal to zero, we 
obtain an equation inc whose unique solut ion c u , if it is positive, assures us that, 
when c = co and fc = fc 0 , B (m) takes its minimum tit. m - .V„. A simple compu¬ 
tation gives 

(2.16) a - 4 + ’t2Si-lW“I > o. 

W^JirAo 

Actually we are interested in considering B {in) only for positive, integral values 
of m. We see Teadily that the minimum of B (m) occurs then at t/i — Ao when 
c is such that 

Ci (A 0 , /to) < c < C 2 (Ao, ico), 
with Ci and c 2 roots of the following equations in c: 

B (No) — B (Ao -f- 1), 

-B (A 0 ) = B (No ~ 1). 

(If Ao = 1, then c 2 >= 05 .) 



Mt\TM\X USTIM VThK OF MEAN" 


227 


Lot C n C.Vn, l *,l lif tin* dius-icnl (nun-sequential) procedure where one takes 
A T n observations awl uses s * l* as upper bound for the mean. Choose k — Ao and 
r, Midi that (2.17) is satisfied. Then 

w(t, r r) ».\v, n i r\'„ + h-„Q n -f (* ! ) 4- a/(VF„ i*) 

identically in £. (VXa , l*) is a procedure 7'.such that 
(2.18) inf.Co) - infhupH'fcD. 

r £ 

Whenever c. anil A are Riven, the. .V and L of the minimnx solution may 
be obtained as follows: First we obtain an integer -V such that 

r,(.V, A) < c < c 2 (A r , A*). 

Knowing N and A we can then solve for /. 

The results of tins section may bo summarized as follows: For every positive 
c and A there exists a classical estimation proceduie C 0 (A r , l) with positive integral 
N and l > 0 such that (2 18) holds. Cimveisely, for every such pair ( N, l ) there 
exists a positive pair (c, A) so that (2.18) holds. A method of finding one member 
of the pair of couples fc, A) and (-V, l) when the other is given, has been indicated 
above. 

Let Ti be any procedure for giving an upper bound for £. We shall say that 
7\ is optimum if for any other procedure Tt such that 

sup q(£, Ti) < sup a(e, Ti), 

sup A{£, Tj) < sup A(£, Ti), 
t t 

we have 

sup 7i(f, Tn) > sup n(£, Ti). 

£ £ 

It. is easy to prove that the classical procedure Co with any positive l and positive 
integral N is optimum by using the results of the last paragraph. For let 1 — a — 
M (l \/N) and let A and c be the corresponding parameters. We have then 

sup 5 ({, Ti) + A sup A(£, Ti) 4- c sup n(£, T s ) > sup {?(£, Ti) 

£ £ £ * 

4- A A(£, Ti) 4- cn{$, T a )} > (1 - «) + h(jj. 4- ^ 4- cN. 

Since, sup <?(£, Ti) < (l — a) and sup A(£, T 2 ) < l/W 4- ( 2 , we must have 

sup n(€, r») > AT, 

£ 

which is the desired result. _ . 

In a general imprecise way we may say that an estimation procedure is the 

better the smaller the three quantities 

ft(D = sup qti, T), 0,(T) = sup X(£, T), ft(T) = sup nft, T). 

£ £ S 



228 


j. wcii.mvm 


We can now assert the following; No sequential procedure T can be superior to 
the classical fixed sample procedure. C in the sense that 

fort -1,2,3 

and the inequality sign holds for at least one ?. 

In concluding this section we may remark that the ease a < i e., I < (), 
may be handled in the same manner as above except that weu.se M(~~l \/m) 
in place of M{1 Vm). 

3. Miscellaneous results; point estimation. Without going into the neces¬ 
sarily involved details, we content ourselves with pointing out that the, problem 
of estimating sequentially the mean of a normal distribution by a finite interval 
of length not specified in advance, can be solved in similar fashion. As before 
let £ be the unknown mean of a normal distribution with unit variance, wlieie £ 
may be any real value. We want to estimate, by an interval 

(Zatri t ’ ‘ ) *Cn), A^(.ri > * ■ * , £„)). 

Let c, fcj, and h lie positive constants and consider the problem of minimizing 
the supremum with respect to £ of 

1 — P[L\ < £ < Lz | G 1 ] + cn(£, G 1 ) 

+ ft 1 E[(L 1 - £) 2 i <;’! + /,*, mu - £) s I G‘|, 

where G l is the generic designation of the, estimation procedure. Ah before, employ 
an a prion normal distribution of £ with mean zero and variance Y, and let 
a —i ». A fixed sample size procedure will be a mmimax solution. It will possess 
optimum properties similar to those described in the preceding sections. The 
problem of minimizing the supremum with respect to £ of 

1 - P{Li < £ < l 2 1 G 1 ) + cn(£, G 1 ) + US | (L s - UY | £, G‘j 
can be treated similarly. 

Suppose the sample size is fixed in advance Theproblem of finding an estimate 
which will minimize 

suptl - P\u < £ < u\ G 1 ) + hEYL, - £)= | </'J + - £) 2 1 G 1 ]] 

or 

sup[l - P{Li < £ < Ls I G 1 ) + kB\(U - Inf | £, G 1 }] 

can be treated by the method of the preceding sect,ions. 

The problem of estimating (sequentially or with fixed sample size) the means 
oi a.multivariatenormal distribution with known covariance matrix can be 
treated m similar fashion. 

Suppose it is desired to estimate sequentially the mean £(-»<£< «>) 
o a normal distribution with unit variance by means of a chance point 



», ~ UM\1J,S </F Mi. IV 


£ (ri , • ■ , r„i. I>» A' ft l.i‘ lie* W.dd r^k funrlmn fv f. j2]l, a non-negative 
function which livwiir* flu* I<* imurrcl m umhr the particular value ft as an 
estimate when ( i*. flu- a- haul vain**. The functions $ (r, , • • ■ , .eft and //(ft f') 
must have suitable tut a,-ur.ti>ilify pntpcrMe!* f«*r which we refer the render to (2] 
I,el ii 1 ' reek a pmeidure i* -urli that 


Mipl/vS/i-it. 

{ 


fu>l 


ini' eupj/v'jA’fft ft! I- r n((, |)]. 

i i 


Here a IS, ft is the average number of observations under £ when £ is the “true” 
mean The pronnhue £* will he called a minimal solution. We shall assume that 
I{(a, U) is a rnunutimieally mm decreasing function of | a — b j , and that there 
exists a positive nmnher >j Mich that 



/fU, j * exp- 



i 


<lx 


SO , 


As examples of functions with these properties we may cite 


A*hr. I>) - i a — b j, 
A 1 hi, b) - (a - bf. 


As before, assume temporarily that ? is normally distributed with mean zero 
and variance a 1 . We verify without difficulty that a solution £ — ft which 
minimizes 

vL £I'cwf.ai + «><r.t)i«w{-its}* 

is the following :n is identically a suitable constant, say Aft and ft iax(l +1/ Na‘) 

- xh say, so that A <1. For this solution, we have 

A {A (ft ft)} + cn(ft ft) - cN + £ i2(ft xh) exp {- ~ (5 - f) j d£ . 


Write u = re — ft Then 

//(ft xh) = /2(ft h [f + «J) = R(0, hu-\ 1 - A]ft, 

£ A! (ft xh) exp -j ~ ~ (x — ft’j dx 

/ w ( jvW^I 

//(0, Am - [1 - A]ft exp j - du 

= £ R(0, v) exp j— (« + [1 - A]ft 2 j ^ dv. 

Because of the assumptions on the function R the last expression is a minimum 
when £ = 0. We may always choose N such that, for large enough a , the integer 
N is a constant, say Wo. Also A —> 1 as a 1 —> °°. Thus we conclude that the follow- 



230 


j, uomoiuiz 


ing is a minimax solution: n - -Y„ and { C s. If any M iinatu in proi'Muro 
\ is suck that sup n& \) < ,Vo Ikon 


Iff is such that 


then 


f 


i 


Bup »($, f) > A'#. 

i 


If the restrictions imposed above on Ji are eatklied and if the .sample must 
always be of given size iV, the above argument still holds when 1/.V £ (/, and 
shows that the estimate I; minimizes 


with respect to £, 


sup E\R(l f)| 

t 


HEl’KRKNCK 


[1] C. Stein and A, Wald, “Sequential confidence intervals fur (he mean nf a normal 

distribution with known variance, l! /lariats a/ .1 lath, ,%i,, Vol. IS <11117), pji. 42?- 
433. 

[2] A. Wald, "Statistical decision (unctions/ 1 Aanuts of Math, Slat,, Vol. 20 (1919), pp. 

166-205. 



ASYMPTOTIC PROPERTIES OF THE WALD-WOLFOWITZ TEST 

OF RANDOMNESS 

By i JoTTKiitMt Hm wm.i, Nni.mim 
Xnr York 1'mrir‘nly 

1. Summary. Tin* p:ij«*i unm-ttgute* curtain asymptotic properties of the 
teat of minimum's,S Lhm-iI **n tin* statiafn- R h ^2"~i r.r, (A proposed by Wald 
and Wulfnwtlz. It is damn that tin* cmulitimih Riven in the original paper 
fur asymptotic normality of A\ when the null hypothesis of randomness is 
true can be weakened considerably. Conditions are given for the consistency 
of the test when under the alternative hypothesis consecutive, observations 
are drawn independently limn changing populations with continuous cumulative 
di.stnbution functions. In particular a dowmvaid (upward) trend and a regular 
cyclical movement are eonsideied. For the special case of a regular cyclical 
movement of known length the asymptotic iclative efficiency of the test based 
on ranks with respect to (he test based on original observations is found. Asimple 
condition for the asymptotic normality of A\ for ranks under (lie alternative 
liypofliesih is given. This asympiotie normality is used to compare the asymptotic 
power of the AVtcsl with that of the Mann 'i’-test in the ease of a downward 
trend. 

2. Introduction. The hypothesis of randomness, ho., the assumption that the 
chance, variables A'i , ■ • ■ , AT have the joint cumulative distribution function 
(cdf) F(x, i, • • ■ , .r„) ■■ F(.r,) * • • F(x n ) where F(x) may be any cdf, is basic in 
many statistical problems. Several tests of randomness designed to detect 
changes in the underlying population have been suggested, however mostly on 
intuitive grounds. Very seldom lues the actual performance of a test with respect 
to a given class of alternatives been investigated. It is the intention of this 
paper to carry out such an investigation for the particular test based on the 
statistic 

n 

A/i “ ^ . X, X i-f/i , X i > 

proposed by Wahl and Wolfowitz [1]. It is suggested in [1] that this test is 
suitable if the alternative to randomness is the existence of a trend or a regular 
cyclical movement. Both these cases will be treated. 

Let a,, • • • , a„ be observations on the chance variables Xi , ■ • • , X n and 
assume that, the, hypothesis of randomness is true, (Henceforth this hypothesis 
will he denoted by lh while the hypothesis that an alternative to randomness is 
true will be denoted by Hi.) Restricting then Xi, • ■ , X n to the subpopulation 
of permutations of , • • ■ , a n , any one of the n! possible permutations is 
equally likely, and the distribution of Rh in this subpopulation can be found. If 

231 





' -* s «!!? . 


On* level r.f fdiriniu .tut * » i <h« 

positive iiil' ,.!' r. (lit' ♦«■:>? t> 3 «t-r}. 

14 and rejecting //• v>h*«j de¬ 
values. The particular iii«o» *• <•: 

the power <<f the («‘f with tv 3 * 1 ' ... .. ... .. ......... 

Denote the expwted mine and v.amtu* >d fi\ m "!(«• ..ibpMpul i<i»m «,f equally 
likely pmrmlntiont- of n nWrvatviir. o, ■ . ,f , bv mid V 74, u-wper- 

lively. Then it is damn in |l] that if h m pirns- t*i «, 


ii m • 

;•!, : 

, -.i.n 

iT , 

n. 

n ’ wh'*re m is a 

.van d b', 

r 

< ‘r.n: 

n. *0 rii 

1 

h* ^ ibh 1 values of 

u 1 ♦ 

m»’ ' i }) ‘ * 

• d<‘ . 

•1,* 1 

■ “1 

r 

• 'tin' nf thortti m 

, r,” 1 

",d V, 

,1'!* 

5, ■ /•! * 


It :&■* (,* maximize 

*<«. ri,<. 

.;,, , 

-•f .li* 

• Tin’.»» 

olid 

*1 con picraiion. 


( 2 . 1 ) 

and. 


T° 14 


K' h\ 


- '-i (Al 
n — 1 


n ■ 1 


i: a-, 


A 4 


( 2 . 2 ) 


+ 


1 


(n — l)(n — 2j 


• f.l! ~ f 1.1, .1) l Al ■ 2.1, 


<>n 


<A ; 


■U) s , 


where A r - a[ H- • * ■ -f- n r „ , (r ~ I, 2, H, A). Actually ,2.1) and (2.2) are valid 
as soon as n > 2h. 

Let lil = (/4 — hfRi,)/-\/V a H t , . Then it in also el town in (1| that if h is prime 
to 7i, III is asymptotically normally distributed with mean 0 and variance 1 
provided the a,, (i ■> 1, • • > , »), satisfy comlitton IT; 


11 (a. - ri) r 

Jt(a, - n) s f 

L n >“t J 

where a = n~ l a,. 


0(1), * 


(t d, *1, * *), 


It is easily seen that condition IT is satisfied when the. original observations are 
replaced by ranks. When the a,, • • • , a„ are independent observations on the 
same chance variable X, condition IT is satisfied with probability 1 provided X 
has positive variance and finite moments of all orders. It is interest ing to compare 
this condition for asymptotic normality of R k in the population of permutations 
o observations on the chance variable X with the condition for asymptotic 
normality of Ii h under random sampling. For this case Ilocffding and Robbins 
f a J G , 1,11 ^ at ^ efficient to assume that X has a finite absolute moment 
oi order 3. Thus it is desirable to weaken condition IT. This will be done in 
bection 3. 

In further section s the consistency and efficiency of the test based on Rh will 

Sep ^fnr We l* a r S , L ' aC syml30 * 3 0 antl ~ to bo used later, have their usual meaning. 

aee , tor example, Cramer [2], p 122 



Tl.ST OF «\XHOMNKSS 


233 


bp examined ax-mining Oud under ihe alternative hypothesis observations, 
though .still independent, are drawn from changing populations. Throughout the 
paper the eirnilarly defined statistic R h is used However, if with probability 1 

JVllU-U 4.f X n Th — o(/iflO, 

it is seen that asymptotically the test bawd on the non-circular 

fth « 22 -r.-r.+fc 

i^L 

has the same properties as that bused on R h . We find 


iSR, 


a — h 
n[n — 1) 


(/tl — At), 


V Q R* = 


n • 
n(n 


- h 

- 1) 


(Al ~ A<) + »(^ V)W-~2) UlA ' ~ Al ~ 2AlAi + 2Ai) 


, (u_~ h - I) (a - h ~ 2) + 2(/i - 1) ,,, 
n nfi — l)(u - 2J<« — 3) ' 1 1 


l-rla -T SAiAj -f- 3At — GAO 


( n - h) 3 
n 2 (re — l ) 2 


(A? -A,) 2 . 


3. Asymptotic normality of lit, under randomization. Let the set of chance 
variables Xi, ■ • ■ , X n Ik; defined on the n! equally likely permutations of n 
numbers 3(„ = (a t , • • * a„). Then, we have 
Theorem 1: The distribution of Rl lends to the normal distribution with mean 0 
and variance 1 as n —> to -provided 

22 («, - a) r 

(3 1) -=3S = o[n r - r)l< ], (r = 3, 4, • ■ ■), 

[5 - «*] 

ra 

where a = n~ l 22 a,, 

\***\ 

Remark ■. The set 3(„ need not be a subset of 2f„+i. 

The proof of this theorem will be omitted, since it is very similar to the proof 
of another theorem by the author [4], 

Theorem 2: If the , at, • • ■ arc independent observations on a chance variable 
X having positive variance and a finite absolute moment of order 4 T 8,5 > 0, 
condition (3.1) is satisfied unless possibly an event of probability 0 has occurred 
The proof of this theorem will be based on Markoff’s method for proving the 
central limit theorem in the Liapounoff form. 2 Thus we shall show that there 
exists a sequence of sequences S n = (h„i , , b nn ) such that unless possibly 

an event of probability 0 has occured, (i) there exists an index ri (depending 


2 See, for example, Uspensky [51, pp. 388-95, 



231 


oonn.is tt s. mi* rut u 


on the Riven mpieiirei Midi that f*>r u > ,i, ’Ih , :m«l in’' tin* sequences $1* 

satisfy condition (3.1' expnuMd in t*'rnr mi (in 1 h„,, '% !.-■•,« I. 

It is no restriction In asMiinc that EX (l« miicc tin* addition of one and the 
same constant to ever}' u, «Ws n»rf ohmu'c (3.1 >. I#-? 

.V .V 1 n ■; 

and define for i - 1, ■ ■ • , » 

b n , - «,, <\, ft, if O, - Vfn i, 

- 0, a,, if a, > .Vi7i 

so that a, = b nx + c „,. Then h n , and r„, can he nutioidmed ns observations tin 
chance variables Y r , and 7 , n , respectively, wbetc 

1\ ■- X, X n - (), tf X ** .Yin). 

•- (1, - .X, if X > .V'ii j. 

Further le.l p n - l‘\'/i n - A'{, • EV\ , V ' where V - A\ 

F„ , Z„ and r is positive inteRiul, if these moments e\i.,|, ;l (( j - 

and finally, lot F{x) lie the rdf of X. 

In order to prove (i) consider (lie infinitely dimensional sample space fi with the 
generic point w w(a, , <t v , ■ ■ •) and let. E„ !<,» «» > j, (u 1 , 2, - ■ *). 
Then!?,, hits probability mem-ure 7 »„. We shall show that /t« converges. Since 

Ph> = [ m \x |' ri dF(x) > S* tl £ J S r/E(.r, -f J" (//•’(*) > X '*' * />„, 

we find 

l 1 

P* S dl+J »j4*l — )h|{ AiHinUl, Si ■ • 
iV n 

Now (4 + 8)/(4 + 8/2) > 1 and the infinite, sum converges. It follows that the 
set E of points which belong to infinitely many sets E„ has probability measure 0. 
Thus for every point wet] except those in a .set of measure 0 there exists an 
index n a (depending on w) such that for n > n u 

(3 2) a„ < N{n). 

Further, since n v is finite and N{n) —> «>, it follows that for these points there 
exists a second index n' u > n u such that in addition to (3,2) a n < N(n' u ), (n « 
1, • • , Ti u ). Ihus except on a set of measure 0 the sequences are identical 
with the sequences 3( n for n > nl . This proves (l). 

In proving (li) let B nr = £?_i5 r ni , (n, r = 1,2, We first note that under 
the assumptions of the theorem n l A r -> a r (X) for r = 1, 2, 3, 4 except on a set 
of measure 0. Thus except on a set of measure 0 

ffl = = °CD. 2 t 2 = 0(n), a A 3 = 0(a), A„ = 0(n), 

lf l~ 1S Bald t0 be of order J2(a*), k real, if f(n) = 0(n‘) and lim inf 



TKHT OF RAVDOMYKSS 


235 


and therefore by flic argument used in proving (i) again except on a set of 
measure 0 

t) n ^ ii B„i : o(l), H n j - 5J(?i), B „j = 0(n) t B n \ = 0(a), 

It follows that in order to prove (iiJ it in sufficient to show that 

(3.3) /C - o(a (r ‘ im ), (r — 5j 0, • • •), 

except on a set of measure 0. 

Now for r > 5 

ntrd'O < MVn) < A’ 1 " %(Y„) < N r -%(X), 

and therefore 

« 0(A T '“ I ) - 0[n (r ' H)/(l+J,2) ). 

It follows that 

KK, - iM r (Y n ) « 6»(n fr * ,/3>,H,a/2, J 
and 

vur U„ n vur H - »(«>(}'„) - a*(K»)J = 0[n aHmni+m) ], 

so that 

H n»r) « 

Assume nmv that for some r > o (3.3) is not satisfied on a set F r having 
measure t, > e > 0. We shall show that this assumption leads to a contradiction, 
and that therefore (3.3) is true. 

Choose e such that 

(3.4) 1/2 < e < (1G + r5)/(32 + 45). 

Since r > 5, (3.4) can always be satisfied. Then the infinite sum X)«-i (l/^ 2a ) 
converges, and a positive constant d can be found in such a way that 



If we then write the Tchebysheff inequality 

/ J {| B„ - EB nr | > dna{B nr )\ < 1 /dV‘, 

it is seen that except on a sot having at most measure p 

B ni - 0(max[r l (rH/W(4H/2) , n"n lr+tli)ni+m ]\, 

Now for r > 5 

(r + 5/2)/(4 + 5/2) < r/4 

and by (3.4) 

e + (r + 5/4)/(4 + 5/2) = e + r/4 + (5/4 - r8/8)/(4 + 5/2) 

< r/4 + (16 + 25)/(32 + 45) = (r + 2)/4, 



».»*< ll'i El Ji f , Sol Jif" i. 


m> tlisil llir m! j},« >: ]. . 

:e umpfinn. thus Jinn utg t ! <•>;* m 


hi, * !,■ nllr 


4, Consistency. To pr>c,v > t< u«v ") J» f J,.<d on of 

oWrvnfiiins n,, , a., tln> follow inn pro. ».«h,r»<* ,m }**• applied. t/*i the bM 

.-tntlMil' ItO & n ■ •S’l/j . • . /., ■ ,'ilcl dfIloi«‘ l.y /.’ "*• j , . ( 1,1 .'(till V,. - 

V’ n (ri|, • > - , «„) v.iln*' and vstranro of under the .r mmpfiou Hint 

the set, of random variuhh,- A" t . •* , V, j,« n^trirtol in the ,'Uhjtnpulattnii 
consisting of the »»'. equally hk*ty tHnimiatJoiir * *f * hr* of venations, Assume 
that for the ultenuiltves under e.iiirider.Uinn h*it*<* value • of S n at** critical. 
Then we reject the null hypothesis whenever iS„ ti' n \ \ l n > b when*/; h 
wmu* positive constant depending un th«* limiting dHrtbution uf S„ under the 
assumption of equally likely jiermututimm and the level uf sigmfteumv, Thus 
m order lu prove consistency we have In chow that 

(4.1) lint /’ '''* . > !■ //s 1. 

VU 

(4.1) will he satisi’cd if for some t > (1 


hm /’ •. 


'N. - Kl 


VnY\ 


Thus we shall have proved eonbistrnry, if we eau show 1 Vial when Hi is I rue, 
Bl/VnV\ converges in prohafiility tod and (here exists some i > 0 such dial, 

hm n ««,Pl*S n /VnV , i > t\Ih\ l. 

Applying this method to our problem and noting that a corresponding pin- 
eedure could have been used in the ease when small values of .S'„ are erifical, 
we obtain 

.Theorem 3 : The tcsi based on Rh is consistent with respect to alternatives for 
which 

(4.2) *L Rh -> o 

VnV'ks ,,r 


and there exists some e > 0 such that 


Hm I ‘ 


where E°R h and V Rh are given hy (2,1) and (2.2), respectively. 

In what follows it will always be assumed that under the alternative hypothesis 
observations are independent from chance variables A r n with continuous rdf’s 
'n(x), (u = 1, 2, • •). We shall often have, the opportunity to make, use of the 
act that the test is not changed if one and the same, constant is subtracted 
ionr every observation, This will be helpful in reducing our problem to one, for 
which (4,2) is true. 

Let a. be the rank of the observation t„ on the chance variable A,, (i = 



Hr-T Of lUMlinlMSh 


237 


1, • • • , a). '1 Ii'*n it i-, iim restriction tn assume that those ranks take the special 
form 

~(n l)/2, -fa-3)/2, l)/2, 

so that .It 11,-1; \i lu ~ l)u -* Si(u 3 ) and 

(4.4) 1 1 Uh ~ ^ iiin* U(n c ) 

and therefore (-1 2) is always satisfied. 

Bofoio we com find conditions under which (4.3) is satisfied, we have to in¬ 
vestigate, the expected value and variance of A* when //, is true. For this purpose 
write a, - Z" i ?/,„ O' 1, , n), 

if./1/3 if x, > x,, 

(4 o) _ Vii ~ 0. 

—* 1/2 if x, < Xj, 

Then if J J |A", < A"/| p ,,, 0‘, 3 ■= 1, • * , n), we find 

Uihi --- iV‘> - i(l ~ Vn) - p./ ~ i = tit, (say). 

Further, 


n n n 


2/u 2/<+h,fc) 

,-i j~t *-i 


2/n+I.fc ~ II Ik . 


(4.0) 

Therefore 
(4.7) 

and 

VUV A), — 1'j 23 23 Utj lh- 1 -h ,k y nfl ]ia\-h E 23 yijVi+ti.kE 23 V^Va+h, 

ijk. affy 


EUh !//.) = Z Z Z «»/ + 0(n 2 ) 

» ; * 


(4.8) 


o* 


“0T 


= Z Z {Ey.^J.+h.kV^y^h.y - Ey xj y x+h , k Ey a py^. h , y ). 

oC all 


In (4.8) the 1 expression in parentheses is 0 unless one of the Greek indices (in¬ 
cluding ct -f- h) equals one of the Roman indices. Therefore var (A;. | AO) = 0(n). 
It then follows from (4.4) that 

/AVriPft, - 7? 12 lim 

'll Ti n 


and we. can state the following corollary to Theorem 3: 

ConoLLAUY: When using ranks, the lest based on Rh is consistent, if under the 
alternative hypothesis 


(4 9) 


Mii ft jEi+Ji.fc — f^(l), 


“1 7“1 )c“l 


where e„ = < Xj] - 



*288 


Unfit MV t> >,. Minimi 


Since «t, «® *~fn , wo ran write 

-‘T.Z 11 ■ .. - 1. fe,y). 

i 1 i *»)-*« 

and the teat is consistent if 

(4.10) Iim 1 /• X 0. 

et vfe 1i 


4.1. Downward ( upward ) fri mi, Assn mi- that for » < j and all k 

(4.11) ut < 0 

and 


(4.12) 


<a o iqi ■ 


These requirementsuro equivalent b> V\ .V, • A',! •' 1 2 and/'! A', * AM T 
P(Xj < AT) anil are satisfied if the alleinulive to random ness i; a downward 
trend in the sense that J'\Lr) < i • > ‘ x * ' t . i - j», with at least 

one. interval of .strict inequality. 

(4.11) and (4.12) mo not Millieiout tor 11,1U> to he true. Tint’* le-sumc in 
addition that, there exist u positive integer u and a nmnher i ■' It such that 
1 U*b. j— urn* Co c then 


lim ~ L > lim J £ Z 

ttHM 71 -- >*’ ' ' - Jl 1 - 1 


n-*«o lr fc^l 


ik * I. - 
*-AI n* 


< V j 


> 2dhm iZft-li - »*)(« ~ fc-h 

n~» co 71° k 


l) 




i> > 0, 


and the test is consistent. 

The case of an upward trend can be treated in exactly the same way. The 
testis consistent with respect to alternatives for which for i < j and all <, tJ > 0, 
e a ^ <o*, a nd g.l.b. = e, where this time e > 0. 

Another test of randomness, the so-called IT-test, lias been proposed by 
Mann [6] with exactly this alternative of a downward (upward) trend in mind. 
This P-test is also consistent provided certain general conditions are satisfied. 
Ahus the question arises which of the two tests should be chosen if a downward 
(upward) trend is feared This question will bo eonsideied in Section 7. 

4.2. Cyclical movement. Let the class of alternatives bo specified by 

^■ 13 ) tls+o.mo+fl ~ e a p, {a, /3 = 1, • ■ • , g > 1; l, tn 0, 1, • • ■), 

in other words, assume that the statistic Rh is used to test for randomness while 
undei the alternative hypothesis there exists a regular cyclical movement with a 
period of length g. It is sufficient to consider the case A < a. 

If (4.13) is true, 


(414) 


= »* £ *<+K,. + 0(rc) = n% + 0(M), 



•IK..--T nr 


239 


will'll' 

(l 1."') 
and 
(4,1 f'.t 



•1 ,l”3 


n 


i 


v 

« -3 


*rt, O * 


Thus in in'" *if .1 t* Mi*' i< t i !'■(*nt if t) y (1, 

II /< rj, v ndn* *- . ?*• a mu • «? jiuur- ,tud is thcrct'oic > I) if some t„. + 0. 

However if is jin 1 ild> fha' ■ "ini* >0 nil a!! r„,i y 0, hr / 1 1), and still e„. — 0. 

If this happen:, flii- f' 1 i ini'Mii i P-id, nlluiwiM* it is consistent. If under Ih 

tin- population- fjmii which rmr < riph c observations are drawn differ only in 

loratnni, the atin\o ni'affioied eveptimial cum’ eatmnt happen, and the tost is 
always ewiHMwit with le-]»■«•! (,i iJ»i- ehiss of alternatives 

If h *. if, it is ted difficult to I'on-fiuef an example where Xv a«al €a, €a \ h,, 0 
while XX i t-„ f ,„, v ti, when*the »« are a permutation of the numbers I, • , g. 

d'hus in this cum* it i*u><t -nlliesent that,-nine e u / 0 for I ho test to be consistent. 
Consistency may ul-o depend mi the older of the elements of a period. 

Wo may emielude that if ;/ is known, we should always choose h = g. If g 
is not known, we may as well take k 1. 

Chuwjr vi h>,alum. Turning now to the ease when the test is performed 
on the basis of the ot initial observations, it, will often he, appropriate to assume 
that under the alternative hypothesis the distribution remains the same except 
for a location paiamefer. We >10111 consider only the ease of a cyclical movement. 

Thus let 

/•'„(jr) **» F(x — m n ) (a = 1,2, •••), 


where F(x) is t he. ctlf of a chance variable V having mean 0, and m n is a location, 
parameter. It will also lie assumed that V haa the positive variance a and a 
linite fourth moment. 

In the cyclical ease with period g 

(1,17) »(!„+« =* /«» (« = 1, • • • , g > 1; l = 0, 1, •). 

We shall find conditions under which our test is consistent with respect to 
alternatives of this kind, (ffiviously we can assume that XX” i m * = <0 ~ 
since otherwise we could have subtracted iil from every observation. Writing 
tl it'll a„ u„ I- m„ , (n J, 2, •• ■), where u„ can be considered as an observation 
on the previously defined chance, variable V. we find 


■da ••= XI a* •• X u < + 0(1), 


At = XI ’>£ + 2 XI u < m i + X m* 


i-i 

a 


XI u] + 2 XI ««■ 2 w i5+“ + 

1»1 o“l l ~° 


? 2 mi+ 0(1), 

Lfi'J a-l 



GOTTFRIED E. NOETIIEH 


240 


where. n H iti the largest integer such that. n a <j a < n and [n/j] the. largest 
integer < n/g. A* and At are given by similar evpresMons. Since we assumed that 
EU = 0, EU 2 = <r 2 > 0, and Elf < » , we have with probability 1 

E u i - o(n), 52 u ~> ~ Hfn), !i > = 0(n), 52 — 0(n), 

,„i ,»i ,-i i-i 

so that with the same probability 

Ai = o(n), 4 2 = Jl(n), ,4j = fl(n), At = O(n). 

It follows that with probability 1 

E?Rh = o{n), V°R k ~ ' i 2 = 9.(n), 


and condition. (4 21 of Theorem 3 is satisfied. 
Since further 


(4.18) 


n n 

var Ri , = 52 var(a, x { + h ) + 2 52 cov(x< $,+>,, x.+iX.+m) 

— 52 f (<b + n2) (o' 2 + ttil-Hi) - wif m~,+h) 

t«=l 

n 

+ 2 52 + m’.+i.) - rrhml^m^ih] 

n 

= 52 io 1 + <r*(m! + m 2 +) , + 2»i; m, + n)} =» 0{n) 


and therefore except on a set of probability measure 0 

p, P 1 Ru lira - 1 - E(Rh | Hi) 

VnV^Ru At ~ 1 ^ 5 l A , ~ ’ 

« <r + - l_m a 

n () 1 

condition (4.3) is satisfied provided lim „-, K n~ 1 E(R/ l | Hi) ^ 0. Now E(R h \ Hi) 
- \ n /o\ 52a-i WaBij+i + 0(1), so that the test is consistent with respect to the 
class of alternatives (4.17) for which 
0 

52 (m a - m)(m a+h — m) ^ 0, 

cr=l 

where m = g 52a-i m 0 . 'thus by the same argument as in the case of ranks, 
the test is consistent whenever h = g, while it. may or may not be consistent 
if h < g. 

5. Limiting distribution of R^ under Hi in case of ranks. For the remaining 
two sections, it is of importance to know conditions under which Rh based on 
ranks is asymptotically normal under the alternative hypothesis. Using the 
methods of moments, it can be shown that in this case the distribution of 



test ok randomness 


241 


I Rh — ERhi/atfih) lends to the normal distribution with mean 0 and variance 1 
provided vur R ti - c J(ri'). 

Generalizing the method used m Section 4 in evaluating the variance of R h , it is 
not difficult to see that E(R h - EIi h f" n =- 0(n r " ,:: ), (s = 0, 1, •)• It follows 

that ifvar/f* - U(V), the. odd moment,s are asymptotically zero. By means of a 
more, careful analysis, it, is also possible to show that E[ll h — ER h ) u ~ (2s - 1) 
(2s — 3) ■ • 3(var Rh)’. This proves our statement. 

6. Ranks versus original observations. We have seen in Section 4 that if the 
alternative hypothesis is characterized by a regular cyclical movement the test 
based on Rh is consistent, both for original observations and for ranks, provided 
h = g, where g is the. length of a cycle. The question arises which test is more 
efficient, the one based on original observations or the one based on ranks. 

In trying to answer this question, we shall make use of a procedure due to 
Pitman 1 , which allows us to compare two consistent tests of the hypothesis 
that some population parameter 0 lias the value 0° against the alternatives 
f) > 0° using critical regions of size a, S,„ > &,„(<*), (x = 1, 2), where S, n is a 
statistic having finite variance and 8,» (a) is an appropriate constant. The 
relative, ■efficiency of the second test with respect to the first test is defined as 
the ratio rti/ni where ?k is the sample size of the second test required to achieve 
the same power for a given alternative as is achieved by the first test using a 
sample of size n i with respect to the same alternative. 

Let E{S „i | 0) --- fm(0), var(*S',„ | 0) = <r“„(0), and ^[ n {0°)/a { J,d a ) = II,(n) 
Assuming that, the, alternative is of the form 0„ = 0° + k/\/n where k is a 
positive constant, Pitman has shown that the asymptotic relative efficiency of the 
second tost with respect to the first test is given by lim „_ M [Hl{n)/H\(n)], pro¬ 
vided there exists a number e > 0 such that for 0° < 6 < 0° + £ 

(6.1) t', n (0) exists; 

as 0„ —> 0° with n —> » 

(6 2) 
and 

(6.3) 

(6.4) lim \ - 11, (a) = c, , where o, is some positive constant; 

n—♦ «a *\/ 

(6.5) the distribution of [S m — (0)]/o-,„(0) tends to the normal distribution 

with mean 0 and variance 1 uniformly in 0. 

4 1 should like, lo thank Professor Pitman for his kind permission to quote from his 
lectuics on lion-parametric statistical inference which he delivered at Columbia University 
during the spring semester 1948. 


t'M ^ , 
f(„(0°) 

v -j , 



242 


GOTTFRIED E. NOETHER 


Condition(6.5) can be replaced by the weaker condition 

(G,5') the distribution of [.*?,„ - ^,„(0„)]/o- in (0„) tends to the normal distribu¬ 
tion with mean 0 and variance 1 as n —> 05 . 

In our case, in order to insure consistency, it will be assumed that h = g, 
Consider the parameter 

(6.6) 0 = "22 ~ «0“, 

where as before m a is the expected value of the (111 + a)th observation, (l = 
0, 1, ■ ■)• We want to find the asymptotic lelative efficiency of the test per¬ 

formed on ranks with respect to the 1 est performed on original observations as 
d —» 0 with 

Again it is no restriction to assume that 

1 V~\ 

(6.7) m = 7A m a = 0. 

h a-1 


Assume further that the chance variable U defined in 4.3 has a finite absolute 
moment of order 4 + 5, 5 > 0. Then Itl ~ \ZnRh/At with probability 1 and, 
if the null hypothesis is true, it follows from Theorem 2 that with the same 
probability the statistic 

n 

-*=i- 

Ex; 

i=i 


has in the population of permutations of the observed sample values an asymptot¬ 
ically normal distribution with mean 0 and variance 1. This, however, is also 
the limiting distribution of Qa. under random sampling when the null hypothesis 
is true, as follows from the results of Hoeffdmg and Robbins [3], Thus it will be 
sufficient to find the asymptotic relative efficiency of the Id-test for ranks with 
respect to the (d-test. in doing this, it will also be assumed that U has a con¬ 
tinuous density function f(x) = F'(x), and, in order to simplify notation, that 
there are nh observations instead of n. 

In finding E Q (nh), let x aiJ = x a) = and ii„,, = u a j = u.( 3 _p/,+ a , 

(a = 1, ■ • • , h, j = 1, • • • , n) Then 



= (w a , + m*) 2 

nn a - 1 i-i nh o-i j-i 


-aS{£* 


4" 2 m a 23 u aj + nm\ > —> a 2 + 9 

1=1 J pr 1 


^ / S ^ Hct] 2 TYlot ^ Vtaj "|™ 71171 a 


further, 



TFST OF RANDOMNESS 


243 


so that 


m 


,, x nh 


\fnht> 
-)- 0 


'l'Qn(d). 


Therefore 

Also by (4.IS) 


faJd) Vnh 


(it 1 i- or-' 


var (,t, 


n 

nh«* 1 n<’~ Xj m ~« 


„„i v 4- 4c r~0 
nh ia" 1- 0)- ' “ (cr- + 0) : 

which converges In l as 0 » 0, Tf follows tlial 


(fl.fi) 


//</»/*) - *U»>) - v/; / A 

<r 


Condilions (I) 1) •((!., r j) are easily seen to be satisfied. 

Considering now the Ah-tesl fur ranks, we know that {nh t) _5,s i?A has hnite 
variance. Fimn (1.7) and (Ml) -(4.1(1) il is found that 

(0.9) A’| (n!i) " [{> 'll] \ r nh »; - y/'nh S fS e -p) — 'l / Kn(0) 

h' n-i V <1 / 

and after some compulations 


(0.10) \C(I» 

From (4.4) and (0.10) 

Iht(nli) -- \2\Aih 


v-II/i 
[1/ 


Conditions (0.1) -(0.4) and (G.f/) can be shown to be satisfied. 

Thus the asymptotic re.lalive efficiency of the test based on ranks with respect 
to the test based on original observations is 


( 6 . 11 ) 


144 nh 


Una - 


“ poo H4 

/ fix) dx r .» 

L - ?ri -- = 144 * 

nhja 4 L 4-0 


/ 2 (.r:) da: 




As is not difficult to set', this expression is independent of location and scale. 
Let the chance variable U have density function 

'(), x < — 1, x >1, 

1 + x 


fix) = 


1 + a' 
1 — x 

r=~o’ 


— 1 < x < a, 
a < x < 1, 


l -1 < a < 1, 



OUTTFRIEI) 1,. NMl/mr.R 


2 U 


i.c., let the graph of fix) lie given by (hi-1 wo straight line.-, connecting the points 
(—1, 0) and (1, 0) with the point (a, 1). Then AT' — a/3, var U — -£ e (3 + a 5 ), 


f{x) dx = 2/3, and (6.11) becomes [SO \- Thus II m increases 

with | a For a - 0, it is equal to (it,'HI; for | «t | — l, if is equal to (32/27)\ 
It is equal to 1, for a = a/H/R. 

This example shows that the asymptotic relaLive etUeiency of the rank test 
with respect to the test based on oiiginal observations may be <1, —1, or >1, 
depending on the density function/(r). Unless/(x) is explicitly given, no state¬ 
ment can be made as to which of the two tests is to lie preferred. 

We are now in a position to give at least a partial answer to a question raised in 
[1]. In concluding their paper, Wald and Wolhnvitz note that the problem dealt 
with in this section can be posed not only when transforming to ranks, but also 
for any transformation earned out by means of a continuous and strictly mono¬ 
tonic function h(x). 

Let t = h(x) be such a transformation, satisfying in addition the condition that 
Pitman’s procedure remains applicable for the transformed distribution. Corre¬ 
sponding to a and Q we shall use a\ and Q,. Let h(m a ) - n„ , h~ l (n a — m) 2 
= Then if EQ, ~ * Oin (0), by (0.8), (6.0), and (6.10) 


( 6 . 12 ) 


#Oin(g) 

de 




#q,„ dd dr\ 
dd dr] (10 9-o 


Vnh 


1 

[ r“ V i 

I- 

8^_ 

I_ 




- HqM), 


where g{t) is the inverse of h{x). Therefore by (6.8) and (6.12) 


1 

a j 

r nx) dx] 

-oo J 

i 

k 

r 

'—OO 

fmvio dtX 


and the asymptotic relative efficiency does not merely depend on. h(x), the 
operator defining the transformation, but also very essentially on the underlying 
distribution/^). 


7. Comparison of the R h - and T-tests. The T-test by Mann [0] designed to 
test for randomness against a downward trend is based on the statistic 

n 

T = ^ ^ (V'l + i) = H £ Vn + ln(n - J), 

1-1 j>* i i>i 

vdieie y tJ is defined by (4 5). Making the same assumptions as in 41, Mann 
shows that under the null hypothesis T has a limiting normal distribution with 



TEST OF R.VN'DOM.VKSS 


245 


mean {n(n - t) ami variance i\(2n r 3w 2 - 5a), while under the alternative 
hypothesis 

(7 1) £7' - I«(n - l)(2f„ + 1), 

where is defined by — ))$•„ 

Let 

* s ’« JL (r - i»(» - i)i. 


When J/ 0 is true, S„ is asymptotically normal with mean 0 and variance 1. If 
1 r* a 

we then put <£(X) - ~r~ e * da:, a critical region for testing J-f 0 is given by 

5„ < — X, where X is detennined in such a way that 4>(\) = a, the level of 
significance. 

When Hi is true, we find from (7.1) 

| £n) ~ 3 Vn ■ 

By paralleling the proof of asymptotic normality of R>, under Hi given in Section 
5, it can be shown that (ti n — ES„)/a(ti„) is asymptotically normal with mean 0 
and variance 1 provided o(> S'„) = fi(I). This is essentially the result obtained 
already by HocfTding (7). Thus the asymptotic power of the test based on S„ 
is given by 

(7.2) JP{S» < -M ~ * ( - + ^ - -) 


converging to 1, provided lim„_. M y/a f „ = —«>. This is thecondition for consist¬ 
ency given by Mann, 

We may ask for the asymptotic power of the S„-test as —> 0 with n —* «>. 
More exactly, instead of considering a certain alternative e;, = k i} , where the 
/c tJ are given constants, consider the alternative (changing with n) 


(7 3) 


kjf 

€,J ~ 


If then as n —> <x> 


and 


* £ £ h 


n(n - l) , 7>i 


k 


<r(S n ) ->• 1, 

it follows from (7.2) that the asymptotic power of the £'„-test, and therefore of 
the T-test, for alternatives (7.3) is equal to 

<£(X T - 3 k"). 



:t *, MtMiif k 


21Ki 


M««w «'nn?*)d<r (In*- same ‘.Po-Vnm vd,» u tic ‘"t’ts» h\ is »s»*d instead of IF, 
We know that wh< n 1! )*< true 

71 * 7 » 

I in r '**• > 

1 ( 


where Ih it* given )»v i 1.0 is ;v vmp^ihe dlv n*<nmd with menu (I and variance 1, 
TlmKinthi.scaMdherntieid n-gein r-gr. hi l*yA'1, 4 Ifwewt^ ;s]Lh*<ij«ha,*) 
we find 

/# i /l (s ('ll 441 I d \/ ft I 

and asymptotically the power of the /,*!,-u •( i> 


(7.0 


v\u\ 


V. 


* c 


*</.■'. r 


ptovided o{It' n ) ■ £1(1)* Tima the ti*M i> nm-eteiit jjlniw,., v»U ^.How¬ 
ever, for the alternative (7.11 lend* to of,\ i u, provided that as n —> ® 


rr//;' i - I 


Thus the /iV test is liielhdive with ir-pri 1 to the alteinative (7.0) in contrast, 
to the 'T-te.sl. This means that for ihi* .dtemaihe the .•e-ynipfotie lelalive 
efiieiency of the AVlesI With relied to the 7*-lest is It. 


Acknowledgment. 'Ihe author wilier to acknowledge tin* valualih* help of 
Professor J Wolfnwilz who Miggested the topic and tmdei whose direction lho 
work was completed. 


ItKl.'KHKM'I.S 

11] A. Wm.d anii J Wou owro, "An o\;io( lest. for randomness m I tie min-piinitnetriii ease, 
IhukmI oil him in! correlation,” Aitimh <>J i lath Slul , Vnl It HUM), pp 37K-3S8. 
[2] II CiuMliu, MathematicalMethods ujhUnlishc.i, Princeton touv Press, Princeton, 1010 
|3] W IIoEi'i'itiNO and II ItonniS's, “Tlie cenlial linol llieoiein for dependent random 
varialilcs,” Duke Math. ./,, Vol. In (l'llsi, pp 7711 TMI 

[4] G. E NWnnn, “On a theorem by Wald and Wnlfinvim/’ bimla of Math. Mat , Vol 2(1 

(194!)), pp. 455-458, 

[5] J V, Uspensky, Introduction to Mathematical Piithulnlihi , Metliaw-IIill, New York, 

1937, 

10] II. B Mann, ‘‘Noiipniamelne teals aRainal trend," Kronomctrim. Vol. 13 (11145), pp. 
245-250, 

[7] W Hqepfdino, “A c.laaa of alalisties witli asymptotically normal diNliillations,” 
Annots of Math, Slat., Vol 10 (1048), pp. 203-325. 



THE DISTRIBUTION OF THE NUMBER OF EXCEEDANCES 1 

BY R J. (tUMHEI, AND II. VON fif'IIELLING 
New York and Naval Mulind Itcuranh Laboratory, Nc,w London, Connecticut 

0. The problem. We sludy the probability that the mth observation in a 
wimple of size >i taken fiom an unknown distribution of a continuous variate 
will be exceeded .r time*. in .V future trials, and calculate the averages, the 
moments, and the cumulative probability function of the number of exceedances. 
This problem leads to the hypcrgeomelnc series. Our starting point is a special 
ease of a distribution studied by Wilks [3] who considered several order statistics 
whereas we consider only one His tolerance limits are special cases of our 
cumulative probability function. Thus the present paper is, at the same time, a 
specialization and a generalization of the work done by Wilks 

1. Distribution. From a continuous variate £ an alternative is constructed 
by choosing the wth among n observations £ m (m = 1, 2, ■ • , n). The rank m 
is counted from the top, which means that m — 1 (m — n) stands for the largest 
(smallest) observation. The observation £„, is thus the wth largest value. We 
ask: In how many eases ,i will the past with observation be equalled or exceeded 
in N futme. trials taken from the same population? For the sake of simplicity, 
x is called lhe number of exceedances. 

If the initial probability I' (£,„) = F m for a value less than £ m is known, the 
alternative probability for exceeding £„ is 1 — F m , and Bernoulli’s theorem gives 
the probability 

(bl) u h (F m , N, x) = (1 - F m ) x FT x 

that x among N future trials will exceed £ m • However, as a rule the probability 
F m is unknown. The only data known are the n past observations. To eliminate 
the probability F m , wo introduce the distribution v(F m ) of the frequency F m 
of the mth largest among n values 

(1.2) v{?i, m, F m ) dF m = (fy mFT m (\ - F m ) m ^ dF m , 

consider F», as a variate, and integrate (1.1) over all values of this variate Thus 
F m is replaced by a function of n and m. 

'Fhe eonvolution of (1.1) and (1.2) leads to the distribution w(n, m, N, x) of 

1 Opinions or conclusions contained in this paper are those of the authors They are 
not to be construed as necessarily reflecting the views or endorsement of the Navy De¬ 
partment. 


247 



Uf lnlud-t* <>J ‘ 

(mure trail* 


i*i'•••t various in .V 




v v. » , V, , .1 



This probability depend' upnii ♦!##- par sim* t* r,* i.„ *u. and but nut upon the 
unknown probability i\, . Tin r» ‘ur< it s<. <h tnlSire. If we art* mtercHUHl 
itUhedependenrcof irfti,w V.j >>■ n r <.rdv v,< *;n.}<]v wipe wxThe conditions 
fur tlic* punitive integer- m .mil x, ;«nd b*r f| 4 »■ probability ir.n art* 

\ 

(1.3 1 ) 1 “* m < s<; It <• *" ,V, A. nx) 1. 

y 

The distribution i 1 3 y pow-'f*,., the fnllirainy o/au/n fry 
(1.4) U'(», m, .V, jrt - win, n m i 1, .V, ,Y x! 


which roads: The probability {fail f/,r jut <1 mfh uilut from above will be exceeded 
x times in X new (rials is < ipinl (a the prnbnldlily (fail llw jhiM mill mine from htiow 
will be exceeded X - .r limes. 

The nN probabilities* tefn, m, X, x t an* linked by several recurrence formulas 
which follow finally from the usual eoinbinutorial rules. For fixed m, the probabil¬ 
ity for x + 1 is obtained from the probability for .r by 


(1.5) 


w(n, in, N, x + U » »*(«, m, A', x) 


(X + n — m ■- x)(x + 1) 

w{n, n - m 1, N, X — x). 


In the same way, the piobabilituis «*(», in, X, x + 1), »•(«, m + 1, A’, x) ami 
w(n, x, N, m) arc easily obtained from the probabilities u'(n, m, X, x). 'L'he dis¬ 
tribution (1.3) has many aspects since, besides the number of exceedances x, 
also the rank m and the number of future trials X may be considered as variates, 
I*or in = 1 and m = n, the distribution of the number of exceedances over the 
largest value diminishes with x, and the distribution of the number of exceedances 
over the smallest value increases with x. For x — 0, and in — 1. we obtain from 
( 13 ) 

^•°) w{n, 1, N, 0) = = w(n, n, X, N). 

For x = 0, m = n, the probability that the smallest observation will never be 
exceeded, equal to the probability that the largest value will always be ex¬ 
ceeded, is very small, even for moderate sample sizes. 

If n is odd, then in = (n -f l)/2 corresponds to the median of the initial vari¬ 
able t, and the symmetry relation (1.4) becomes 

(1-7) win, [n + 1)/2,AI, x) = w(n, (n + l)/2, N, N - x). 



DIKTIUm TION OF EXCEEDANCES" 


249 


I( is equally probable that the median of the n past observations is surpassed 
,r, oi N — .r times in .V future trials. 


2. The two asymptotic distributions. If both n and N are large, m may increase 
with n Mich that flu: quotient infn remains constant, and the with, values remain 
near the median. Or, m remains constant such that m « n, and the with values 
are extremes. 

In the first, ease, let n ■ N -■ 2k — 1, where k is large. Then m = fc is the 
rank of the median of the initial distribution As shown m (1.7), the distribution 
of the number of exceedances over the initial median is symmetrical. To obtain 
the asymptotic distribution we reduce x by writing 


(2 1 ) 


x — k + z\/k 


where 2 remains in a finite interval The same reduction may be applied to mth 
values in the neighborhood of the initial median. The distribution of the number 
of exceedances over the initial median is, from (1.3) and (2.1), 


u>(2fc - 1, k, 2k ~ 1, x) = const 


/ 2k - 1 \ 

\k + z^/kj 
( 4ft — 3 V 
\2ft + zVk ~ 1/ 


Consider only the factors involving the variate 2 , then the right side becomes, by 
Stirling’s formula, 

(2fc + z\/lc -- l)!(2ft — z-\/k — 2)! 

(h + z\/k)\(k — zs/k — l)! 

(2k 4- zv / fc)“ f ' zV '*’(2fc - zVk\ n ~^ k er*Vk+*Vk 

(ft + *V *)* +,vT ( h ~ zVic) k ~ lV * 


VT 


Combination of the factors with the same powers leads to 

(4fc 2 - kzT ( (2k + zVfc) (& ~ ^Vfc) V Vr 
(ft 2 - kz ) k \ (2ft - zVk)(h + eVh)/ 

( i- s) / tMzsa) 

0 - iT a 1 _ ^X 1 + 7*) 

Since ft and \/ft are large, and 2 is small, all factors lead to exponential functions 
whence 


r z~ 


exp 


lira in (2ft - 1, ft, 2fc - 1, *) = const e 

A™“ 


and finally, 
( 2 . 2 ) 



250 


E. J. GIMHEL AND II. VON HC'ItlXUNG 


The number of rxrcrdances ortr the, initial median, m = k, in a huge, sample of 
size 2k — 1 in 21, 1 future hints; is normally dish Untied with mean, median, 
mode, and vananrr equal to l: Therefore the pmliabihlics (2 2) may lie called 
the distribution of normal r rrmlanrrs. 

In the second ease where, N and n are large, and m and x are small, a distribu¬ 
tion analogous to the Poisson distribution will lie obtained. To indicate that 
N and n are large, they are written A r and n. 'I'he probability 


w(n , m, N, r) 


(x + m - + n - j - m )! 

(m — 1) 1 x 1 (n — m) ! {N_ — x) 1 (N -f- nji 


obtained from (1 3) becomes, by use of the Stirling formula, 

, iV 'i ( T + !,i ~ A N* 

(2.3) * )(S + „r- 

= win, n — m + 1 , N, N — x). 
If n = the preceding formula becomes 


(2.4) 


win, m, n, x) = ^()) m+I = w(n, n — m + 1, n, n — x ) 


This probability that the mth largest (or smallest) value will be exceeded x times 
(or n — x times) in n future trials is independent of n. Since in is small compared 
to n, the probabilities (2.4) may be called the distribution of rare exceedances. 
For x = 0, we obtain the probability 


w(n, m, n, 0) = Q) m = w(n, n — m + 1, n, n ) 

that the largest (or smallest) mth extreme value is never (or always) exceeded. 
For m = 1, and n = N_, the probability 

(2 5) w{n, 1, n, x) = (J) I+1 = win, n,n,n — x) 


that the largest (or smallest) value is exceeded x times (or n — x times) is a 
geometric senes. 

To obtain the moments of the distribution of rare exceedances (2.4) we con¬ 
struct its generating function 


G x {t) = 


(*)’ 


, yi 1% + TO — 

U \ m - 1 



From the well known expression for the negative binomial follows 


(2 6) gm) = ar (l - fj™, 

whence, by the usual procedure 

(2 7) x = m 

Ihe mean number of exceedances over the mth value from above in the dis- 



(XJ m At.//ignqox a / 


D1KTHIRUT10N OF EXCEEDANCES 


251 


tnbution of rare cxeordanecH in m itself. The second derivative of (2.6) for t = 0 
leads to the variance 

( 2 . 8 ) a ~ 2m 

which is the double, of the variance in the Poisson distribution. This difference 
is easily explained: If we apply the Poisson law to the exceedances, we have to 
know the mean number of exceedances. In our case we only know one observed 
number of exceedances. ('onseijuently the variance must be larger than in the 
Poisson ease. 


Churn 1 



The variance for the distribution (2 2) of the normal exceedances was 
(.N -f l)/2, whereas the variance (2,8) for the distribution of rare exceedances, 
2m, is much smaller since m is small compared to N. This interesting relation 
will he generalized in paragraph 3. 

For m increasing, the distributions (2.4) spread as shown m graph 1. The dis¬ 
tributions have two modes 

(2.9) xi = m — 2, xi = m — 1 

except for m = 1, where the probability diminishes with x. The distributions 
(2 4) are similar to the Poisson distribution for integer m. However, for this 
distribution the modes are m — 1 and m 





252 


E. J. GUMBBO AXE H. VOX HniKLEIXG 


The similarity between the. two distributions may also be seen from their 
behavior for large m. In this ease, the Poisson distribution for the standardized 
variate y - (x — m)/a converges toward a normal distribution. The same 
holds for the distribution of rare exceedances. For the proof consider the standard¬ 
ized variate 

( 2 . 10 ) y = (x — m)/-\/ l lm. 

Its moment generating function G v (t) becomes, from ( 2 .( 1 ), 

G v (t) = ( 2 e‘'V^ - e 5i/v^)-" 


The usual development leads to the second member 



If we neglect the factors 0(m 3/2 ), we finally obtain 
( 2 . 11 ) (?„(<) = c “' 2 

which is the normal generating function. Thus the distribution of rare exceed¬ 
ances converges toward normalcy in the same way as the Poisson distribution. 


3. Moments. We return to the general distribution (1.3). For the calculation 
of the moments, the hypergeometric series F(«, /3, 7 , 1) defined by 


(3.1) 


F(«, 0, 7 , D = 1 + ? - + 

1 7 


q(« + 1) PiP + 1) 

1-2 7(7 -f 1) 


+ •• 


is used. The x -f- 1st member of this series is 


fa ol fM = g ( g + ^ " • (« + * — 1) P(P + 1) • • • (P + x - l) 

fU *1 7 ( 7 + 1 ) ■■ (y + x- 1 )' 

On the other hand, the x 4- 1st member of the distribution w{x) may be written, 
from (1.3), after changing the signs, 


(3 3) 


w(x) = 



m(m + 1 ) • • • (m + x - 1 ) 
x! 


__ (-N){-N +!)■■ (-AT + x - 1 ) _ 

(m ~ n - IV) (m ~ n - AT + 1) • - ■ (m — n — N + x — 1 )' 

This is the general member (3 2) of the liypergeometric series, if we write 

(3 I) a = m, p = -N ; 7 = m — n — N 



DIKTIUmmON OF EXCEEDANCES 


253 


Therefore the probability w(n, m, N, x.) is the x -f 1st member in the development 
of 


n [ (N -f- n — ?»)! 


F(m, —N,m — n — N,l). 


(N -H n)!(n — in )! 

Since, the sum of the probabilities w(x) must be unity, we obtain 

(3.5) F(m, —N, m -n- N, l) = ( -^±-^ ? l)l ■ 

n 1 (N + n — wi )! 

This relation will be used for the calculation of the factorial moments %] of 
order k which are, from (3.3.), 


®l*] 


n\(N + n — m )! 


(3,0) 


(n — m) \(N + «)! xt*k 

N(N — 1) ■ (N — x + l)m(m +1) • • • (in + re — 1) 


(re — k) \(N -Ha — m)(N + n — m — 1) • • • (N -H n — m — x + 1)" 
The lirst member in the sum is 


(3.7) *>(1) = 


N(N 


()'(N + u 
The. second member is 


l) ... (AT — k -f 1 hn(m -f l) ■ • • (in + k + 1) 

- m)(N + n — m — 1) • • • (N + n — m — k + 1) 


<p( 2) - <p( 1) r 


(N - k)(m -H k) 


1 \(N + n — m — k,) 


Generally, each successive member is obtained from the preceding one by the 
same rules as the successive members of the hypergeometric series (3.1) Con¬ 
sequently, from (3.0), 


(3 8) £[(,] 


n\(N + n — m)! . (IV — k)(m + k) 

(n - m)KN + n)l \ + l\(N + n - m - k) 


The sum in the brackets is the hypergeometric series 


F(m + k, — (N — k), (m — n — N + k), 1). 

If we replace, in (3.5), m by m -H k, N hy N — Ic, n by n + k ,we obtain for the 
sum in (3.8) 


F{m + A;, — (N — k), m — n — N + k, 1) 

(3,9) _ (N + n)l{n — m )! 

(n + k)\(N + n-w- /()!' 


Introduction of (3 9) and (3.7) into (3.8) leads to the factorial moments 


(3.10) 


m(m + 1 ) • 
*W = - 


(m + h - 1 )N(N - 1) ■ (N - k + 1) 

(n l)(n + 2) ■ • ■ (n + k) 



!51 


K. J. fir.MHKI; ANI) II, VON' SniEI.MNH 


mil to the recurrent relation 


(m + JL - 1)(,V - fr + l) . 

xm - n+~fc" 


IE n and N are both of the name order of magnitude, and largo compared to 7c, 
the expression (3.10) simplifies to 

(3.10") xi*] = m(m 4- 1) • • ■ (»i 1- k - l). 


Graph 2 

Averages of numbers of exceedances. 



For 7c - 1 we obtain the mean number of exceedances x m over the with largest 
value in N future trials 


(311) 




= N 


m 

n + i ‘ 


This expression is identical with the classical formula x = iV(1 — F m ) in the 
Bernoulli distribution (1.1), since the mean of 1 - F m obtained from (1.2) is 
in/(n -j- 1) In both distributions the means need not be integers. The mean 
number of exceedances over the smallest value is n times the mean number of 
exceedances over the largest value. If N = n + 1, we have x m = m, and the same 
holds if n and N are large. If n is odd, and m = (n + l)/2, the mean number of 
exceedances over the median of n observations is 7V/2. The means x n are traced 
against m in Graph 2 for n = j\T = 9, and n = N = 10. 


Avrrayes of number of exceedances 




IlIfiTKIII t !TION OF EXCEEDANCES 


255 


The mean number „.c (tf exceedances over the wth value from below is related 
to Jr„ by 


(3.12) -1- m :r -- .V. 

Tito variances a' rA and n n" of tlto number of exceedances over the ?nth values 
from almu* and below horomc, from {3.10), 

, _ B w.Y / (m h l)( N - 1) _ mN \ 
n •(■ 1 \ n + 2 n + 1/ ‘ 


The choice of a eonmion denominalttr leads, after trivial calculations, to 


(3 13) 


2 at A (/t — in + l)(A r T ft T 1) 

Tn +l)='(ir+2) 


The variances increase with X and diminish strongly with increasing n The 
vannnrr in maximum for m — 0i + 1)/—, i.e. for the median observation where 
it heroines 


(3.130 


, A r (N + n + 1) 

4 (ri + 2) '■ 


The, variances of the mnnher of exceedances over the largest and the smallest 
value arc* 


(3.13") 


j nN(X + n + 1) 
<Tl " (« + !)-(« + 3) 


The quotient of tlte variances of the median and of the extremes is 


(3 11) 


C(n+I 3/2 

<ii 


(>l + l) ! 

4a 


17 (n-l-D/2 
id 2 


Consequently the variance of the median is about n/4 times larger than the 
variance, of the extremes. In other words, the extremes are more reliable than the 
median, and this quality increases with the sample size. This is a generalization 
of the relation obtained in paragraph 2. Such a behavior seems singular. How¬ 
ever, it also holds for the uniform distribution, and for the distribution (1 2) 
of the frequencies [1], 

In Bernoulli’s ease, the variance <rl is, after replacing 1 — F m by m/(n + 1), 

a „ m (n — ?a+l) 

* N (n + 1) (n + 1) ’ 


whence, from (3.13), 


2 

Om 


2 

Ob 


N + n+ 1 
n + 2 


> 


2 

cr*. 


The variance m our case is larger than in Bernoulli’s case, since we do not assume 
the knowledge of the probability F m which is required for the Bernoulli distribu- 



7 * M W I - 1 


* % 3 * S' 


* < 1 v + * H r n 1 \ ’ 




turn 1‘Vir ,V », i ■}, »'-* * f 1 ' ‘ !iV 

«lnttrib»i<i*m. ThT i* .» »v n* ■"dir f“ rrtV! * 1 ’ H 


• tl* 1 ‘ * it Hi'' licnumlh 


4, The mods ami the ntfdiiun W»* u-L ho **•■< J»‘* * l K diuM** n,Ili,i ”‘ r * of 
fwcwhuifiw over the jtfu'U'w «.ll» liw * unixiig s< m(> -«'rv,iti»»ii lit A fufiit* 1 
trials. If a in< A r\«*-(., H mint !*• ,m »«»»«•*•* Shk« f !.*■ >it »n!*ut»<n v, jrS ikmwa 
{or inerrawsi u*)h j fur m I mr *» «•* "Hh > ” !l 

(1.1) » ' m *’ 1 

Tin* mode is obtained from the mrijiuhtif-. 

(4/2) «(n, m, X, / - 1) < vht, m, V. r* ’ wto. .V, •<- <- 1 1 

which lead, from (l.. r t) to 

(4.3) (m - I) sV - 1 <* i (»» i • 

The length of the interval i.s unity, a* for the Ih’incnilli distribution. 

There are several canes where turn modes rust, 

a) Let the number of future, trial* .V la* Mich that 

(4.4) A r ~ /;(« --1)1 

where k is a positive, integer. Then the modes an 1 , from ( 13) 

(4.5) in) - k(m — 1) — 1; x<s> kOn I). 

b) The modes (4.5) also hold if n anil N are largo rompared to unity, and if 
N = k'n, where k' is again an integer. 

c) If n is odd, the median of the initial variate has the rank m - (« + !).**• 
If, at the same time, N is odd, there are two modes, namely 

(4.6) £ (l , = (AT - l)/2;*« - (A' + l)/2 

In the case N = n, the two modes Z(d = in — I, and aim — in differ by unity 
from the modes valid ia the tw r o previous eases. 

In the case n = A r , and m ^ (n + l)/2, only one mode exists. To find its 
location, consider first the ease that n — N is even, and m 32 n/2. I h('n the 
upper limit in (4.3) is 

[m - l] + - -- . (m - 1) g [m - 1] + 1 - 1 ; < M- 

n — 1 n — I 

Piuce the interval has unit length, the inode is x — in — 1. If m > (n 4“ l)/2, 
the lower limit is 

N - 2] + --- - Cm - 1) > [w - 1]. 

r 10 i. 

The case that n = N is odd is treated in the same way, and leads to the follow- 



MXtKIlirUU.V OF EXfEKD YN'OEh 


257 


mg ir'ulf: The mn.sf probable numbers of exceedances over the with value m 
X --- // Inline trials arc 


(1.7) 


x - m - 1 t»ir w f tt/2; ? - m fur m > (n/'2) 1, 

if n — N is even, 

x in - j 1 fm* m < (n 4- 0/2; x — m for m (n + 0/2, 

if 7i — N is odd. 


\W now consider the median If the probabilities w{x) are summed up from 
,i _ 0 onward, there may exist an integer x m such that the probability for at 
most c m exceedance.-, is 0 This is the median number of exceedances. Such a 
number need not exist. Assume, for example, N < n, then the probability 
w(ii, 1, X. 0) alone (.see (l.li)) surpasses and the number of exceedances over 
the hugest and the smallest value do not possess a median. If the median z m 
exists, it follows from the symmetry (1.4) that A r — :i;,„ — 1 is the median of the 
number of exceedances over the mill value from below. The relation 

(4.8) Xnt 4” mX ~~ lV " 1 

differs from the corresponding relation (3.12) for the mean. In some special cases, 
the median can lie obtained immediately. Fora; = 0, in = I, n = N, formula (1.6) 
leads to 

w{n, 1, n, 0) = \ = w{n, n, n, n). 

The probability Hull the largest {or smallest ) of n past observations will never {or 
always ) be exceeded in n future trials is equal to }. If n and A r are odd, and m = 
(a 4- l)/2, the summation of equation (1.7) yields, with the help of (1.3'), 

2 i v{z) - 2 w(s) = 1 — 2 ^(z). 

0 N—x s+1 

Now the. median number of exceedances x is such that the two sums on the right 
sides are equal to Consequently the median number of exceedances in this 
case is m — 1 
We claim that 


(4.9) x, n = m - 1 

for all m, provided that n = N. For the proof, consider the probability 
IF (n, in, X, x) that the mill largest value is exceeded at most x times in K fu¬ 
ture trials. This is the sum of the first x 4~ 1 members w{x). Let F v (a, ft, y, 1) 
be the sum of the first v members of the hypergeometne series (3 1) Hum the 
substitutions (3.4) und v = x 4 ~ 1 lead to 



F x+ i{m, -N, m - n 


- N, 1). 


(4.10) W{n, m, N, x) = 



K. J. (it‘M»lX A\'l> H, VOS M’HI.I.UNC 


For tlif* sums of tin* hvpergenmet rie sene*. /*’ <», 7, 1 < the following recurrence 

formula [2] is used, 

(7 - ,i~ eclfy - d - « 5- 1) • (y - J - 1) .. , n 

v 1 1- 1) • • • f.f F W - 1) 

<•( " o ify -ol 1 ) • ■ • (y — nr d" <' “ 1 ) 
/’’..O'. 7 “ <> '■ ", 7 - n 1- 1). 

The substitutions used in (2.4), and >• ■- .1 f 1 lead to 

(~ n)(-n 4 - l) • ■ (-« T m - 1 ) .. v 

( — a — N)(—n — AT + 1) - (— /t — A 4- ~ 1) 

= . _ ___ (~A’)( -.V 4 1) --(-.Vi ./■) 

{-n - N)(~n -- A' +!)■•• (*-// - A 7 4 - .r) 

FJ.r I 1, -it, -a -- A' + .r 4 - 1, 1). 


1 - 


This equation may lie written from (l.KH 


(4.12) ir(n, n?,A r , x) = 1 


f.: v ) 

- V.. * \ (.r4-1, ->t,-n - ; 

/ N 4- n\ 

\.i 4- 1/ 


A'4 .1 -1- 1,1). 


For x = m — 1, and N = n, the equation becomes 


W{n,m,n,m — 1) 


(”) 

= 1- )!-< ?„(»», -»> - 2 . 


it + m, 1) 


From (4 10) it follows that the second factor on the right suit 1 is equal to the 
left side 


(») 

i,m, n, m - l) = }~ l <- F„ t (m, —jt, —2 


it 4- in, 1) 


Consequently 

(4d3) 17(11, m, n, m — I) = ?, 

Ifn — N, the median number x m of exceedance's over Ihc mlh largest value is m — 1, 
as stated previously The means, modes, and medians obtained from the exact 
formulae (3 11), (4 7) and (4.9) are traced in graph (2) for n = N = 9, and 
n = N = 10 



DISTRIBUTION' OF EXCEEDANCES 


259 


5. Probabilities of at least one exceedance. If we sum up the probabilities 
w(t) from zero up to a certain x (or from a certain x up to N), we obtain the 
probabilities IFkr) (or P(.r)) for at must (or at least) x exceedances over the 
mth past value in N future trials 

(51) lb(.r) = 2 M>( 2 ); P(i) = X) w(z) 

where 


ir(.t) + P(x - 1) = 1; W(x - 1) + P(x) = 1. 

The boundary conditions are 

TF((» = m(0); W(N) = 1;P(0) = 1; P(N) = w(N). 

Fiom the symmetry (1,4) it follows that the probability for the mth value from 
above to be exceeded at most r times is equal to the probability for the mth 
value fiom below to be exceeded at least N — x times. 

From (5.1) and (1,3) it follows for in = 1 (and m = n) that the probabilities 
for the largest (or smallest) among n observations to be exceeded at most once 
in n futuie trials converges toward 3/4 (or zero), respectively. If n is large, the 
probability that the largest value will be exceeded at most a times in n futuie 
trials is, by virtue of (2.5), 

(5.2) H’fe, 1, h, :i) = 1 — (J) 1 = P(n, n, n, n — x) 

independent of n. 

Consider now the probability that the mth largest value will be exceeded at 
least oner m N future trials 


fr n't nr xr il i »’• {N + U — Ml) 1 

(0.3) I (n, m, N, 1) ( Y_ w y, ( A r + „) I 

= IFfn, n — m + 1, N,N — 1) 

If A r and n are large, and m is small, this expiession becomes 


Pin, m, 2V, l) — 1 



1 Yin, n — m + 1, Xi X ~ D 


Foi m — t and n — N, (I 10 probability is independent of the size of n. 

The least, number of exceedances over the smallest value for given probabili¬ 
ties P, called the tolerance limit, has been derived by 8. 8. Wilks [3] A related 
problem is the following• How many ti ials N have to lie made m order that there 
is a given piobability a for (he mth largest value to be exceeded at least once 5 By 
virtue of (5.3) w r c obtain N from 


7i'(W + n — to) 1 
(■n — m)!(iV + n )! 


(5.4) 



h. J. rtrMULL \M1 H. vox MIIKHINT. 


2ai 


For the largest value m 
( 5 . 5 ) 


1, thi*' equation leads to 


.V 1 

n 1 — ic 


1. 


Of course, N/n increaecs with «. If n is law, and m remains small, equation 
(5.4) leads, in first, approximation, to 

(5.0) *X - (I - a) - 1. 

n 

The quotients N/n as function of « are traced in graph (3). The quotient is 
plotted vertically against 1/(1 — «) plotted horizontally, both in logarithmic 
scales. The abscissa shows the probability nr. The curve for m ~ 1 is exact The 
corresponding curves for the penultimate, and the, two preceding values 
(;n = 2, 3,4) are obtained from the approximation (5.0). The graph reads in the 
following way: The probability that the largest, or second, or third, or fourth 
value from above are exceeded at least once in 100?), or 9», or 3,0m, or 2.2n fu¬ 
ture trials is a = .99. Inversely, in 4u future trials the probability that the larg¬ 
est, or the second, or the third, or fourth extreme value is exceeded at least once 
is a = 0,80, or 0.9G, or 0.092, or 0.9984, respectively. 

In a similar way we calculate the probabilities that the largest (and penulti¬ 
mate) among n observations is exceeded at least twice in N future trials Let 
«2 bo this probability. Then wo have for the largest value 

1 — a 2 = w(n, 1, N, 0) -f- w(n, 1, .V, 1) 

-_!L _( 1+ _ ) 

n + N\ ^n + N- l )' 


For n sufficiently large, the expression simplifies to 



n 


The probability a 2 as function of N/n is also traced in Graph (3) and designated 
by m, = 1, x = 2. Finally, for m = 2 the probability a 2 for the penultimate value 
to be exceeded at least twice is obtained for large n by 



This probability a 2 is also traced in Graph 3 and designated by m = 2, x = 2. 
If we fix the probabilities a 2 , the graph shows the number of future trials cor¬ 
responding to 1 and 2 exceedances over the largest, the penultimate, and the 
two preceding observations. 



999 




h'/m 


K. J, (iUMBKL AVI) II. VON MMUJN'Ii 


6, Applications. In 50% of ail cases, the luigi'M (nr MnallH ) of n past observa¬ 
tions will not (or always) be exceeded in A' -■ n fiilme trial- Tire mean number 
of exceedances is the mean m (he Iktioiilh luMribiition lb utrimw is krpl 
for the Median , and smallest for thr niirm , and this Miperimitv of the extremes 
increases with the sample size. 

If the previous, and the future Minph 1 mzi* both :ne huge and eipi'il, the dis¬ 
tribution of the number of exceedance^ o\er the median observation in nor mat 
with mean and variance of the order zed, whereas the distribution of the ex¬ 
ceedances over the with extremes (the law »f rare exceedances), similar to the 


Poisson distribution, has the mean in, and the \arianee 'hit, m being small com¬ 
pared to the sample size Elementary calculations lead to the setting of sample 
sizes A corresponding to given probabilities for 1 or 2 exceedance^ over the past 
largest and penultimate observation. 

These methods may be of mleiosl. lor InreraAing floods if, instead of the size 
of the flood, we are interested only in the frequency The same procedure may 
also be applied to other meteorological phenomena such as droughts, the extreme 
temperatures (the killing frost), the largest ptoeipitatioiis, etc., and permits to 
forecast the number of cases surpassing a given severity within the next A' yeais 


EBFBIUiNClS 

[1] E, J Gumbel, "Simple tests for given hypotheses," Ihmclrik, Vol 32 (1942) 

[2] H von Sciielling, “A formula for the partial sums of some hypergeometuc senes," 

Annals of Math Slat,, Vol, 20 (1949), No. i 

[3] S S. Wilks, "Statistical predictions with special inference to I he problem of tolerance 

limits," Annals of Math Slat, Vol 13 (1942). 



ON THE ASYMPTOTIC DISTRIBUTION OF THE SUM OF POWERS 
OF UNIT FREQUENCY DIFFERENCES 1 

By Biudfokd F, Kimiull 
New York State Department of Public Service 

1. Summary. Since the "unit" frequency diffcioncc-.s (nee (2.2) below) are 
dependent, the usual methods for establishing the noiraal character of the 
asymptotic distnlnitinn of the sum of random variables fail 

Howevei, the essential ehaiaeter of the distribution is disclosed by the integral 
functional ldationslnp (3 (i). From this it, is possible to show that for large 
samples the distribution approximates “stability” m the normal sense ([2] and 
Lemma 2). 

Using the condition that, the thud logarithmic, derivative of the characteristic 
function is uniformly bounded for all n on a neighborhood of t = 0 one can 
prove that the asymptotic distribution exists and is normal 

2. Introduction. Consider a one dimensional statistical universe characterized 
by a cumulative frequency function (rdf) F(x) which is continuous. Consider 
an ordeied random sample a, of size N such that 

(2 1) .r, < a\ +1 , i = 1 to N — 1. 

Consider frequency diffcicnc.es v, defined by 

Ui = F(xi), u N+ i = 1 - F(x/r), 

(2.2) 

u,+i = F(x t+l ) - F{x,), % = 1 to JV - 1. 

Thus 

(2 3) Z «, s 1, 

W+l 

and the formal integral of the probability density function [pdf) of the u, taken 
over the complete sample space of aq can be written as 

(2 4) N\ / dui dui • • ■ duh-i duh+i ■ ■ du N+ 1 = 1, 

where v h is any it, which it is found convenient to omit, and the region of integra¬ 
tion is the Afrfold Euclidean space bounded by the coordinate hyperplanes 

«, = 0 , i 7 ^ h, i = 1 , 2 , • • • N + 1 , 

and the hyperplane 

(2.5) it] -\- u 2 4* • ■ • + Wa-i + W/i+i + ■ ■ • + u n +1 = 1. 

(See [lib_ 

1 Tins is the second paper in connection with the subject announced in Abstract No. 9, 
Annals of Math Slat., Vol 17 (1946), p 602, and Abstract No. 331, Bull Am Math Soc , 
Vol. 52 (1946), p 827 Foi first paper, see [1] 

263 



URAIU'OUI) F KIMHAI.t. 


2lH 

Consider a test function ?/■/ defined by 

C2.n) y» - I>:\ ;>>(', r .v + i, 

where p is a real positive number, ilf is an integer less than or equal to A T + 1 
and such that if JIT < N -\- 1 the u, which are to be omitted may be arbitrarily 
selected, but the subscripts indicating the order relation (2 2) are for the present 
retained. 

Consider the case where A r is odd and M is even, and set 
(2.7) A r = 2a-1-1, M - 2m. 

Divide the set of AT ■+- 1 frequency differences u, defined by (2 2) into two 
subsets such that each subset contains n + 1 differences of which exactly m aie 
included in the test function (2.fi). Now let A r become infinite over odd numbers 
Ni, N}, • • , In other words the sample size is to increase without limit. For 
each sample size N, in such a sequence let M, lie an even number such that 

(2 8) M, < N, + 1 

and such that the ratio M f /Nj is controlled for large values of N by 

(2.9) lirn. M,/Nj - constant c, 0 < c < 1. 

As above for each step in the sequence (he set of N, + 1 fiequency differences 
w, is divided into two subsets of n, + 1 frequencies each with 

(2.10) N, = 2iij + 1, M, = 2m, , 

such that m, frequencies of each subset are included in the test function 

(2.11) y M , = 

Now we note that for a random sample of size N taken from the above universe, 
the characteristic function Gs(t', ij m ) may'be defined by 

(212) GAt) 2 /a,) = N\ I e' lv * du , dm • • • &u„ 

taken over region m Euclidean space of N dimensions as indicated for the 
integral (2.4), taking index h equal to A r -\- 1. 

3. Proof of integral relationship—Lemma 1. Eor simplicity of notation drop 
subscripts from M, , N,, n, and m, . We separate the test function yu into two 
parts y m and y n , such that 

(3 1) Vm ~ y m + ijm' = 2 w? + S m?, m = m' = M/2 

tn rV 

where the m frequency differences u t in y m are those included in first subset and 
those contained in y m t are those of the original M frequencies included in the 
second subset (see (2.101 and (2.11)) 



UN W ASYMPTOTIC DISTRIUUTION 


265 


The foiraal integral defining G*(t; y M ) may be written 

(3 2) G\(f, y,,) = r(2n + 2) f e" u ~ du, ■ •. d« n+1 [ ' du n+2 ■ • • , w 

Jk- Jft, 

whore 


/i’i! = 2nd- 1 dimensional Euclidean space bounded by coordinate hyperplanes 
and plane 1» 

Hi = n dimensional Euclidean space bounded by the coordinate hyperplanes 
and the plane 


,3.3) 


Un+i + Wn+3 + * ‘ * + W5n+1 =1 — 10, 
W = V X + U 2 + ■ • * + Un+L . 


Now introduce the transformation to u\ 


(3 4) iu(l — w) = u,, i = 7i + 2, n + 3, • • • , 2n + 1, 2m + 2. 


Thus wc have 


z-1, 

n H 


and the n w< involved in the integration are bounded above by the hyperplane 
Zn v! = 1. The Jacobian is (1 — w) n . 

Similarly under transformation 


(3.5) 


V{W = «, , l — 1,2, • - , n + 1, 

2 V, S3 1 
n+1 


Let v,, i = 1, 2, • • • n and w replace the remaining variables of integration. 
Thus the region of integration of these v, is V{ > 0 with the hyperplane^n v, = 1 
furnishing the upper bound. The Jacobian of the transformation is w n . 

The regions of integration of these new variables u, and v x are seen to be 
independent of each other and of w, Noting effect of above transformations on 
y m and y m > , the integral (3.2) will be found to reduce to the following form: 

(3.6) G N (t-, y u ) = V T fj 1 ^ ~ [ w"(l - w) n G n (tw p ;y, n )G*(t(L - w) p ; y„) dw, 
l t ’{n + 1 ) Jo 

where 

N = 2n + 1, M = 2m. 

Lemma 1. This Junctional relationship holds for all values of N and M subject 
to the condition that N be an odd integer and M an even integer, One may note that a 
similar integral functional relationship will hold for any partition (nonC) of the 
N — 1 free frequency differences such that 

no + ni = N - 1, mo + mi = M, 

with corresponding changes in the Gamma functions which precede the integral. 



niunroiin f. kimrMjIi 


In order to tind oul what, happens when .V becomes huge the partially noi mal- 
ized test (unction z.v is introduced. This is defined by 

(3.7) z* - (i/u - /7v)(-V + 1) p /VM, 

where (cf. [1], formula (3.1)) 


!/m ~ Hill") 


MUX + l)I’(p + 1) 
l'f.V H- 1 d- 7 ») 


I have referred to z M as a paitially normalized variable, since 

E{su) = 0 , 

(3 9) lim E{z"m) = r(2p + 1) - r \p + 1) - cpT> + 1 ), 

A— 

where this limit can be shown to be greater than zero for 

71 1 , 0 < c < 1 , 

(3 10) 

p = 1 , 0 < c < 1 . 

Recalling the, separation of the test, function into two parts (see (3.1)) we 
define y m and y m , by 


Dm — Dm* — 


ml’(n + l)r(p + 1 ) 

r'(B + i + p) 


M = 2m, N = 2n + 1. 

From Stirling’s formula it can then be shown that 

(3.12) (N + IYvmIVM = ( 27 V 2 ) 2 [(ti + 1 )’y„/Vm] + o(l), 

where o(l) goes to zero as N and M become infinite subject to the condition 
(2 9). Thus if we define z m and z m < by 

(3.13) z m = ( y m — y m ){n + l)7Vro, z„,> = ( 7 /m' — Vm’)(n +- 1 Y/y/m, 
since 

Vm = Vm+ lim' 


it follows that 


{N + 1 Y/VM ~ (2VV2)(n + 1 )7Vw, 

Z M = (27V2) (Zm + Zm') + 0(1). 


Hence if we denote the characteristic function of the distribution of the 



ON AN ASYMPTOTIC DISTRIBUTION 


207 


partially normalized test function r. it by Gn(l‘, z») and proceed to develop an 
integral functional relationship similar to (3.0), one amves at 

G,(l; Zu) = f ™ n Cl - io) n G„[t(2io)7v/2, Zm ] 

( 3 . 15 ) ~r ■ ) a) 

• G n [t2 v (1 — ir)7V2i 8*1 

with 

.V •= 2» + 1, M = 2m 


4. Resulting functional relationship when iV becomes large. The second 
lemma shows that the functional equation satisfied by the characteristic function 
of a normal distribution is approximated when A r is large. Suppose we now set 

(4.1) w = (1 + s)/2, 1 — w = (1 — s)/2, dw = ds/2. 


Substituting in (3.15) we have 
e 


(4.2) G n = 


Set 


,<0(l) r(2a + 2) ' +l 


2'Jn+l p^ n _|_ 1) 


n 1 

I a- 


S' 2 )" G n [t(l + s) V /\/2; Zm] 

G„[t( 1 - SV/V2, Zm 


(4 3) II(t, s ) = G„[i(l + s) V-v/2; 2 m ]G„[f(l - s)7\/2; 

Then 

(4.4) II. = G' B 0„tp(l + s)"“7V2 - G n G7p(i - 6') !; -7V2. 


Using law of mean write 

(4.5) H(f, s) = tf(i, 0) + s//,(t, A(s)], 0 < | /i(s) | < s. 

Substituting in (4 2) we have 

(4.6) e~' loU) G/f = Hit, 0) + [ B.[t, A(a)](l - s=)" S ^ 

With JJ( 3 m ) s 0, from the fact that the limiting variance of z m is bounded 
(see (3.9)) it follows that the first derivative of its characteristic function remains 
bounded m any finite interval, for all n ([3], p 90). Thus 

(4.7) | Gntii *«) I < -4. 0 < | t | < D, for all n 

For case p > 1, by virtue of condition (4 i) II , will leniain bounded or ct 
interval of integration of (4 6) as N becomes inLimte Let B denote such upper 
bound of the absolute value of II; . Then, carrying out the mlegiation 

, „ , ^ Br(2n + 2) 1 

absolute \alue of integral |T ^(ti “I - 1) 


(4.8) 



BIUDi'OM) V, KIMH.MjL 


“JUR 

for any value of C. Thin quantify approaches zeu> as .V pica to infinity uniformly 
for l on any finite range 1'or the race that it < p < 1 a similar argument may 
he used by including the factor (1 - V 1 which appears in II, in the integration, 
and placing the upper hound on the absolute value, of the factor G„G„ . 
Substituting buck for Hi f, 0) in fl.ll) one arrives at 
Lemma 2. The characteristic function (hjl: c,,) satisfn s the relationship 

(4.0) (Mt;z*) « IG4//V 3; z„)f -1- o(l), -V -- 2n •)- 1, M -= 2 m, 

V'htrc o(l) goes to zero with increasing n, uniformly jnr t on any finite interval 

(4.10) 0 <\L\< D. 

The above lemma indicates that if the asymptotic pdf of z m exists, it will be a 
“stable” distribution in the normal sense |2). In order to set, the stage for proving 
the existence of this asymptotic distribution we shall lost investigate the third 
logarithmic derivative of (t; z„), 

5, Investigation of third logarithmic derivative. We shall now show that the 
tlnid logarithmic derivative of G is uniformly bounded in some, neighborhood of 
t = 0. We first prove that the absolute, value of the third derivative of G is 
bounded for all t and n. Now the third derivative will have absolute value, less 
than the third absolute moment which I denote by . Using Liapounoff’s 
inequality 

(5.1) pi < PiP* 

one asks whether the fourth moment g* remains limits ns n and m become infinite. 
Computation of the fourth moment about the moan appears to be somewhat 
formidable. However it is not so difficult to show that it remains finite with 
increasing in and n. Referring to previous paper ([1] formulas (4.8)—(4.JO)) 
we use quasi-moment generating function go(x) such that 

( 5 ‘ 2 ) <SgfO)/dx = r(pr + 1), g o (0) = 1, 

and it, follows that 

(5 3) ■E(X) u ?) r = £\s/o{0)] m /dx r T(n + l)/r(n + 1 + pr), 

m 

and one recalls that 

V = 

with 

z = [(n -(- l)7Vm][y - y\. 

The resulting fourth moment of s will be in the form of a fourth degree poly¬ 
nomial in m whose coefficients are of the type 

l) 4p r(n + 1) (n + l) 8p r(n + 1) 

T(n + l 4p) ’ r(n + 1 + 3p) ’ " ' ’ 


z«r, 


V = 


mr(n + l)r(p + 1) 
f(n + 1 p) 



ON AN ASYMPTOTIC DISTRIBUTION 


200 


combined with the first, moment, with m ' appearing as a factor. By expansion of 
the Gamma function in asymptotic .series in (n + 1) it is not difficult to show 
that the coefficient of rn becomes asymptotic like (n + I)~ 2 , and that the 
coefficient of ni becomes asymptotic like (n -|- 1)~\ It follows that as n and m 
go to infinity with m ~ r(n | 1), that this fourth moment approaches a finite 
limit. lienee one eoneludes that the third derivative of G has bounded absolute 
value for all n and i 

Since Ihe absolute value of the first derivative of G is uniformly bounded for 
finite i and all n it follows fiom the properties of a ehaiactcristic function that 
given a positive number K less than unity, it is possible to find a value of t = t a 
greater than zero such that 

(5.4) 0 < K < | G n (t, t) | < 1, 0 < | 1 1 < to , 

for all n. 

From the above double inequality and the fact that the absolute values of the 
first three derivatives are uniformly bounded it follows that the third logarithmic 
derivative of G is uniformly boumhd for all n on the interval 

(5-)) 0 < | t j < to. 

6. Proof that the asymptotic distribution of g exists and is normal. Sinco 
absolute value of G is uniformly bounded away from zero on interval (5.5) one 
can write the functional relation (4.1)) as 

(0.1) log G s d, z u ) 2 log GJt/\/'>, z m ) -1- o(l), 

where o(l) goes to zero with increasing n uniformly for t on interval (5.5). 
Introduce the notation: 

\(n) equals variance of z m , 

g(t, n) equals third logarithmic derivative of (?„((, z m ), 

R(t, .V) equals remainder defined by 

(0.2) log Gait, Z M ) .X(A')f*/2 + N(t, 1 V). 

Write 

(0.3) log r/.y/Vu, qUo/V'i, n)tVi^V2), o<o<i. 

Substituting (0.2) and (0.2) in (0,1) 

(0.4) li(t, A') = [X(A') • - \(«))r '2 1- [1/V'MtO/V'X ii )( 8 /0 + o(D. 

By (3.9) 

(0.5) lim X(n) lim X(.V) positive number X. 

We have proved that thete exists an upper hound U such that 
(0 0) | q(t, a) j < f 



270 


B1UDFOUI1 F. KIMBALL 


for all n and for t on interval 

(6.7) 0<\t\<k- 

Hence from (6.4) one can reason that given a positive e, a number N 0 can be 
found such that 

(6.8) \R(l,N)\ < [l/V2]U\e/G\ + t 

for all t on (6.7) and for N > N u . 

By (6.1) 

(6.9) R(t, 2N + 1) = [X(2tf + 1) - X(N)]tV2 + 2R{t/V2 , N ) + o(l). 

Using (6.8) 

| R(i/V 2, N) | < [1/V2]U | «7(12V2) | + e. 

Hence for any positive number e 2 a number N t can be found such that 
I R(l, N) | < (1/2)77 | <70 | + 2« + e*, N > N ,, 
for all t on (6.7). After fc such operations, taking e, = e 

(6.10) | R{t, N) | < (1/2)* W 17 I i a /G | + (2 k - 1)«, N>N k . 

Thus given a positive number d one can determine k such that 

(l/2) k ' 2 Ull/d < d/2, 

and e such that 

2 l « < d/2, 

and therefore a number JV* +l such that 

(6-11) | R(l, N)\ <d, N > N k+ i 

for all i on interval (6.7). 

It follows that G N (t, 2 ^ converges uniformly to exp (— \i 2 /2) on interval (6.7). 
Convergence of G N (t, z M ) for a value t = t x outside the interval (6.7) may be 
proved by choosing integer k such that 

(6- 12 ) 0<U|/(V2) fc <<o, 

and taking 

U = ti/(V2)\ 

Recalling that the functional relation (4.9) holds for all finite i, this can be 
applied k times, tbns building up Z a to i x 
It follows from the continuity theorem that the distribution function of z m 
converges to the normal distribution function. 

7. Statement of theorem proved. The proof given above has involved the 
restriction that N be odd and M even (see (2 7)). This restriction is required 



ON AN ASYMPTOTIC DISTRIBUTION 


271 


for the integral relationship (3 6). However, if N were even one could take 
no = N/2 and n, = n„ - 1 and deal with, G„ 0 and <?„, m the integrand Also 
if II were odd, one could take m 0 = (AT + l)/2, m, = m 0 - 1, and deal with. 
(7no(h m) and wo) in the integrand. This would of course carry with it 

corresponding changes in the Gamma functions which precede the integral. 
As long as we require that 


A r = n 0 + n\ -f 1, M — m 0 + mi, 

lim M/N = liin m»/rio = lim wq/n, = c > 0, 

the arguments used in arriving at the asymptotic relations (3.15) and (4.9) 
will apply Hence the theorem: 

Theorem 8 . For a one dimensional statistical universe whose cdf is continuous, 
consider the Junction oj the unit frequency differences u, 

(7.1) y = £«? 


taken from an ordered random sample of size n (see (2.2)) where p is any real 
positive number, and m is any positive integer less than or equal to n + 1. The 
selection of which m unit frequencies are to be included is arbitrary. Then with 


(7.2) 


V = E(y) 


mT(n + l)r(p + 1) 

r (n 4- v + 1) 


consider the partially normalized variable 

(n -f~ l) 13 


(7.3) 


■y/m 


(y - y)- 


If n goes to infinity, with m becoming infinite so that 

(7.4) lim m/n = c > 0, 

then the asymptotic cumulative distribution of z exists and is normal, with 

(7.5) lim Etf) = T(2p + 1) - T 2 (p + 1) - cp 2 r 2 (p + 1), 


except in the trivial case p = 1, m = n t l,m which case z = 0, and in the case 
P = 1, c = 1. 

REFERENCES 

[1] B. F. Kimball, “Some basic theoiems for developing tests of fit for the case of the 

lion-parametric probability distribution, I”, Annals of Math. Stat , Vol. IS (1947), 
No, 4, pp. 540-548 

[2] P. Levy, Thtorie de l’Addition des Variables Alialoires, Gauthier-Villars, Paris, 1937, 

Chapter V. 

[3] H CitAMhn, Mathematical Methods of Statistics, Princeton University Press, Princeton, 

1940. 

[4] P. A P Moran, “Random Division of an Interval”, Jour. Roy Stat. Assoc , Suppl , 

Vol. 9 (1947), pp. 92-98 

2 For the case p = 2, m = n-j-l,an interesting proof was published by P. A P Moran 
m 1947, see [4] 



EFFECT OF LINEAR TRUNCATION ON A MULTINORMAL POPULATION' 

BY Z. W. lilRN'HAUM' 

University of Washington 

1. Introduction. In adnm-Mim to educational institutions, personnel selection, 
testing of materials, and other practical sit nations, the following mathematical 
model is frequently encountered: A (/; i- /l-dimensional random variable (Xi, 
Xi, • • • , Xk , l’i, Y s , • ■ • , Til (X, Yi i" considered, with a joint probability- 
distribution assumed to In' non-singular multi-normal. The l*i, Vj > • ■ • , Yi are 
scores in admission tests, the Xi, ,Y.>, ■ • • , .\\ score.- in achievement, teats. The 
admission testa are administered to all individuals in the (X, Y) population 
to decide on admission or rejection, and (usually at some, later time) the, achieve¬ 
ment tests are administered to those admitted A set. of weights a, > 0, j = 
1, 2, ■ • • , l is used to define a composite admission tost, score U = 52! =i QjY j 
and a “cutting score” r is chosen so that an individual is admitted if U > r, 
and rejected if U < r. We will refer to this piocedure as linear truncation of 
(X, Y) in Y to the set V > r. 

A linear truncation in Y clearly will change the absolute distribution of X, 
except in the case of independence. In this paper a study is made of the absolute 
distribution of X after linear truncation in Y in the ease k ~ 1; in particular, 
the possibility is investigated of choosing the a, and r in such a way that the 
distribution of X after truncation has certain desirable properties. The ease 
fc > 1 leads to a considerable diversity of problems which are. being studied and, 
it is hoped, will be the subject of a separate pape.r. 

Throughout this paper it will be assumed that all the parameters of (X, Y), 
that is the expectations, variances and Covariances before truncation, are 
‘ icnown. In practical situations it often happens that only the parameters of 
Y 1 , IT, ■ ■ ,Yi before truncation are known, while the first and second moments 
involving Xi, Xi, ■ • • , X* are only known for the joint distribution after 
truncation. It can be shown [1] that in such situations the expectations, variances 
and covariances of (X, Y) before truncation can always be reconstructed if 
(X, Y) has a multinormal distribution. 

In the simplest case k = l = 1 the probability-density of the original bi¬ 
normal random variable (X, Y) may be, without loss of generality, assumed 
equal to 

(LI) f(X, Y, p) = — 

By truncating this distribution in Y to the set Y > r one obtains the probability- 
density 

(1.2) g{X, Y; p, t) = ^\r)f{X, Y) p), for Y > r, 

___ 0> for Y < r, 

1 Presented to the Institute of Mathematical Statistics on June 18, 1949. 

5 Research done under the sponsorship of the Office of Naval Research. 

272 



u.vmn TRUNCATION 


273 


where 

For further use we introduce the abbreviations 


e"' n dl. 


(U) 


(IS) 


<p(r) = 


V%r 


-r 


AM = . 

lAt T) 


Wc also note, the inequalities 

(1 6) r < A(r) 

and 

\/-i + T 2 — 


(1 7) 


A(r) < 


derived in [2] and [3], respectively. 3 

Before proceeding to the more-dimensional case, rve will study some properties 
of the marginal pinbubilify-distribution of X after truncation to Y > r 


( 1 . 8 ) 


vZA'; P, r) = ^ (j{X, Y;p,r) dY. 


2. The moments of <p(X;p, r). In this section all mathematical expectations 
nic computed for the absolute distribution of X after truncation of ( X , Y) to 
y ^ t. 

We have 


AX;p, r) = 4> 1 (r)ip(X)\p 


and hence 


E (X") = [ X"v(X,p, r) 

d—OQ 


dX 


, /•+» V \'n- 

= Z‘(r) £ e~ x ^ A 


i-i /•« 

00 2iTT \^2tT *( T~pA' ) /\/l— p 

■ (^) L 

3 Implicitly, the inequality (1 6) was known already to Laplace, cf. Mecanique CHet> e, 
transl. by Bowditch, Boston 1839, Vol. 4, p, 493. 


ff s3/s dS dX 



274 


Z. W. MUNIUUM 


+ Vi' - ? ” (vi -4’)] 

_' f" 

f(r)\/l — p " J- 


dX n 


7 ? 

1 dX 
From the identity 
( 2 . 0 ) 

we obtain 


+ 




r - p£ 

\/l - P 3 


<ZX. 


s ’®'" (vi -4) ■ (4 -7 



= V. — p 3 <p(t) 

and hence 

(2.D ecd = e + pxw £“ t.svr-7 + prr~voso ds, 

for n > 1. 

For n = 1 this yields for the expectation of X after truncation 

(2.2) E(X) = pX(r). 

For n = 2 we have from (2.1) 

fi(X J ) = 1 + p 2 rX(r) = 1 + P rE(X ), 
and hence for the variance of X after truncation the expression 

(2.3) <r\X) = 1 + E{X)[pr - E(X)}, 


f + " GSVl ~ P 3 + pr) n "V(S) dS, 

J— 00 


or 

(2.31) <r 2 (X) = 1 - p 2 X(t)[X(t) - r]. 

From (2.2) we see that E(X) always has the sign of p, as one would expect. 
From (2.3) one finds a lower bound for r 


(2.4) 


E\X) - 1 
pE{X) 


From (2.31) and (1.6) one concludes that <r 2 (X) < 1 for p ^ 0, hence the 
variance of X after truncation is always less than the variance before truncation, 
except if p = 0. 

Similarly one computes from (2.1) the third moment about zero 


E(X Z ) = 2?(X)[3 - p 2 (l - t 2 )] 
and obtains for the third moment about the expectation 
(2 5) E[X ~ E(X)f = E(X) p 2 UX(t) - r][2X(r) ~ t] - 1}. 



LINEAR TRUNCATION 


275 


Numerical computation indicates that the quantity in braces is always >0, 
which would moan that the skewness of X after truncation has the same sign as 
E(X) and p No analytic proof of this statement has been obtained. 


3. Determination of r for given expectation or quantile of X after truncation; 
dependence of this r on p. Let it be required to determine r so that the expectation 
of X after truncation assumes a given value m. It follows i mm ediately from 

(2.2) that this r is obtained by solving the equation 


(3.1) X(r) = - 

P 

for r, which can be done with the aid of a table 4 of X(r). 

Another problem which occurs in applications consists in determining r so 
that, for given 0 < <x < 1 and X a , the a-quantile for X after truncation assumes 
the value X a , that is so that 

(3.2) I** <p(X; p, r) dX = <T'(t) [ X " [ f(X, F; p) dY dX = a. 

J— oO J— to J T 

Let 

(3.21) Pis, l ; p) = d y dX 

denote the volume of the probability solid Z — f(X, F; p) above the quadrant 
X > s, Y > i Then (3 2) may be written in the form 

1 P(X a ,r,p) _ 

1 ' ~ “■ 

or 

(3.3) (1 - a)f(r) = P(X a , t; p). 


and this equation can be solved for t by trial with the aid of tables of i (/(t) and 
Pearson’s tables [4] of P(s, i; p), 

Lemma 1. For fixed expectation of X after truncation E(X) = m, the solution r(p) 
of (3.1) is a strictly decreasing function of the absolute value of p for 0 < | p | < 1. 
Proof: Differentiating m = p\(r) with regard to p one obtains 


and, in view of the identity 


the expression 

(3.4) 


0 = X(t) + pX'(r) 

X'(r) = X(t)[X(t) - t], 

dr _ _ 1 

dp p[X(r) — t]' 


4 A table of 1 /X(t) is, for example, given m Karl Pearson, Table s for Statisticians and 
Biometri cians, Pari II, 1931, pp. 11-15. 



Z. \Y. BIUNHATJM 


276 


From (3.4) and (1.G) we see that 

(It 

bi"n — = —sign p, 

which proves our lemma. 

Lemma 2, For fixed a, X„ , the solution r ~ t(p) o/ (3.3) is a strictly decreasing 
junction of | p | for 0 < | p | < 1. 

Proof: Differentiating (3.3) with regard to p one obtains 

, , , dr dP dr . dP 

-a - *w - * 4 + 7 , 7 . 


and hence 
(3.6) 


_dP 

dr _ dp 

dP ^ + (1 - «)*(r) 


From (3.21) one easily verifies that 

= *( T )(1 . 
dp 

and therefore 

dP(X a , t, p) 


) _i f _ dt, 

•‘(Xa-fiT)fVl-p 1 


(3.7) 

One also computes 


dp 


> 0. 


dP(X a , r] p) 
dr 




/ X a ~ pr \ 

Ivwr 


so that the denominator of the right hand expression in (3 6) becomes 

- pr 


Y>(r 


1 — a — i/' 


\A - 


In view of (3.3) this is equal to 


<p(t) 


P{X a , r, p) 


’/'(t) 


- t 


Xa - 


Vi 


- pr V 

- pV. 


= X(r) 


[riX., (fjfzX) 

_ x ( ,) i f 

Jr J (X*-pr)/V l-p* 

1 r 


df/dF 



LINEAR TRUNCATION 


277 


If p > 0, then. pY > pr in the interval of integration r < Y < hence 

< ^j==- ' a , therefore the integrand 7i(7) is positive, and so is the 

denominator of (3.G). Similarly one sees that if p < 0 the integrand h(Y) is 
negative for r < V < » and the denominator of (3 6) is negative. In view of 
(3.7) we conclude 


(It 

Slgn Ti P 


-sign p for p ^ 0. 


4. Linear truncation of (X, Y \, Y «, * • >,71)1° the set 2i-i 0/7, > r for 
given expectation or quantile of X, minimizing the rejected part of the population. 
Let (X, 7i, 7 2 , • - ■ , Yi) be an (l + l)-dimensional non-singular normal random 
variable with all expectations, variances and covariances known. We wish to 
choose (h , «2 , • ■ ■ , at and t so that by setting 

(4.1) U = ±a,7, 

i-i 

and performing the linear truncation to the set 17 > r we obtain for the expecta¬ 
tion of X after truncation a pre-assigned value m , and that this is achieved with 
the least waste of the original population, that is so that for the non-truncated 
probability-distribution the probability P(J2j-i afY j < t) is minimum. 
Without loss of generality we may assume that, before truncation, we have 

(4 21) E{X) = E(YT) = • ■ • = E{Yi) = 0, 

(4.22) <r 2 (X) = 1, 

and thus 

(4.3) E(U) = 0. 

Furthermore, the a, and r can always be multiplied by a constant, without 
changing the set of truncation, so that we have 

(4 4) AU) = 1. 

Theorem 1. To truncate (X, 7i, 7 2 , • ■ • , 7j) linearly m 7i, 7 2 , • • ■ , 7j so 
that the expectation of X after truncation has the given value m and that the probability 
of the rejected pail of the original population is minimum, it is necessary and 
sufficient (1) to determine a \, a 2 , • ■ ■ , a? so that the absolute value of the correlation- 
coefficient p{X, U) becomes maximum under the condition (4.4), and (2) for U 
determined by these fli, a*, - • ■ , ai and for p — p(X, U ) to solve equation (3.1) for r. 

The proof of this theorem follows immediately from the first paragraph of 
section 3 and Lemma 1. 

Using the second paiagraph of section 3 and Lemma 2, one equally easily 
arrives at the following theorem; 

Theorem 2. To tnincale (X, Yi , 7 2 , • • • , 7i) linearly in Yi, 7 2 , • • • , Yi 



278 


Z. W. BIRNBAUM 


so that the a-qiiaiihlc of X after truncation has the given value X a and that the 
'probability of the rejected part of the original population is minimum, it is necessary 
and sufficient to satisfy (1) in Theorem 1 and then to solve equation (3.3) 

The problem of satisfying requirement (1) of Theorems 1 and 2 can be solved 
effectively by a method due to Hotelling [5]. It may he worth noting that this 
method yields two sets of constants, cq , n« , • ■ , ai and —a, , — a*, • ■ ■ , —ai 

both maximizing | p{X, U ) ! but leading to values of p(X, U) with opposite 
signs Nevertheless the choice between <q , a», ■ • * , ai and — rq , — iq, • ■ ■ , —a; 
and the determination of r are unique for any given m, since (3.1) has a solution 
for r only if sign p = sign m 


B. Linear truncation of {X, Yi , F 2 , • • » Yi) to the set a jYi > r forgiven 

expectation of X after truncation, minimizing the variance of X after truncation. 

It may be of practical interest to choose a,, a* ; • • , ai and r so that, with 
the notations and under the assumptions of section 4, the expectation of X 
after truncation becomes equal to a given number m, and the variance after 
truncation is minimum. 

Theorem 3. To truncate (X, Y) , V 2 , • • , Yf) linearly in Yi,Yi, • • , Yi so 
that the expectation after truncation has the given value m and that, under this 
condition, the variance of X after truncation becomes as small as possible, it is 
necessary and sufficient to satisfy the conditions (1) and (2) of Theorem 1. 

The proof of this theorem follows from section 3 and the following lemma: 
Lemma 3. For fixed E(X) = m, the variance a 2 (X) after truncation is a strictly 
decreasing function of the absolute value of p for 0 < | p | < 1 
Proof: According to (2.3) we have 

(7 2 (X) = 1 -j- m(pr — to). 

Differentiating with regard to p and using (3.4) we have 



For r < 0 this clearly is <0 For r > 0 inequality (1.7) yields 
t[Mt) — t] — 1 < i(iV4TV 2 - 3 t s - 2) 

< M t (2 + r) — 3 t s — 2] = t(1 — t) — 1, 
and this is < 0 foi r 0. Together with (1.6), this proves that 


r[A(r) — t] — 1 

A(t) — T 


< 0 


for all r, and hence according to (3.1) 

dc\X) 

sign — j -— = — sign to 


—sign p. 



LINEA.Il truncation 


279 


Itmaybeconjectured thatthesiga of da 2 {X)/dp is opposite to that of p alsoin the 
case when a 3 (X) is the variance after truncation minimized under condition 
(3.3). This would lead to a theorem stating that the same choice of a*, 04 , • • ■ , m 
and r which according to Theorem 2 makes the a-quantile after truncation 
equal to tlve given number X„ and minimizes the rejected part of the original 
population, will also minimize the variance of X after truncation. 


REFERENCES 

[1] Z W. Biiinbaum, IS. Paulson and F C Andrews, “On tho effect of selection performed 

on some coordinates of a multi-dimensional population", Psychometnka, Vol. 15 
(1950) 

[2] R. D. Gordon, "Values of Mill’s ratio of area to bounding ordmate of the normal 

probability integral for large values of the argument”, Annals of Math. Stat., 
Vol 12 (1941), pp. 304-3GG. 

[31 Z. W Bihnbaum, "An inequality for Mill’s ratio", Annals of Math. Slat., Vol. 13 (1942), 
pp. 246-246. 

[4] K. Pearson, Tables for Statisticians and Biometricians, Part IT, 1st ed , Cambridge 

Univ. Press, 1931, Tables VIII and IX. 

[5] II. Hoteu.ing, “Relations between two sets of variates’’, Biomelnka, Vol. 28 (1936), 

pp 321-377. 



NOTES 

This section is devoted to brief reseat eh and expository articles and other short items. 


EXTENSION OF A THEOREM OF BLACKWELL 1 
By E. W. Baiun kin 
University of California, Berkeley 

1. Introduction. In [1] (§1) the author has announced, as bearing on the 
results there, that Blackwell’s method [2] of uniformly unproving the variance 
of an unbiased estimate by taking the conditional expectation with respect to a 
sufficient statistic, is in fact similarly effective on every absolute central moment 
of order s S 1. Our purpose heie is to establish this. In addition, the equality 
condition (null improvement of the moment) is piesented m terms of a primitive 
property of the estimate. The asserted uniform diminution of the s-th moments 
for a family 17 of distributions is, as in the case s = 2, a twice removed con¬ 
sequence of the fundamental fact for a single distribution that the absolute s-th 
power of the conditional expectation of a measuiable function is almost every¬ 
where (a.e.) not greater than the conditional expectation of the absolute s-th 
power of the function This is the substance of the theorem below. The second 
corollary then states the result for unbiased estimates 

2. Preliminaries. Let SI be a space of points x\ ft, a cr-hold of subsets of SI, 
and n, a probability measure on ft. Let I be a function on SI onto a space T of 
points t] Z v a <r-field of subsets of T; and T —a sub-cr-field of ft —the inverse of 
X r under t A set in £ r will be denoted by yl r , where A is its inverse undei t. 
Let v denote the measure on X v defined by v(A r ) = p(A). 

If / is a real-valued, 2 ^-measurable, ^-integrable function on SI, we denote by 
E(f ] ■) the conditional expectation of / with respect to l Corresponding to any 
particular function h on T (as, for example, E{f | ■)) we define the function 
h* on SI by 

h*(x) = h{ t), i{x) = r. 

The qualification "essentially” prefixing a statement will mean that with the 
possible exception of a set of points of measure 0, that statement holds true. 

The following two simple lemmas enable us to present the conditions for 
equality, in the results below, in terms of the elementary characteristics of the 
function f. 


1 This note was prepared under 0 N. R contract 

2 With no changes m this note, and only minor changes in [1], the results we have set 
forth concerning unbiased estimation pertain as well to complex-valued functions. 

280 



ON THEOREM OF BLACKWELL 


281 


Lemma 1. A necessary and sufficient condition that sgn fix) = sgn E*(f\x) 
a e. in) is that sgn / be essentially a Junction of t. 

The necessity of the condition is clear. To prove sufficiency, let f be a function 
on ft which is a e. equal to/, and such that sgn/' is an (unqualified) function of t. 
Now if sgn fix) = sgn E*{j | x) does not hold a.e. (a), then there is a SE-set, A, 
of positive measure, such that, for example, for x e A, fix) > 0 while E*(J | x) S 
0. We then have the contradiction 

0 < [ J'dti = f fdu = f E*(f | ■) dn S 0. 

Ja Ja Ja 

Lemma 2. A necessary and sufficient condition that fix) — E*(J \ x ) a,e. (a) 
is thatf be essentially a function of l. 

Again the necessity is obvious, To show sufficiency, let /' be a function on ft 
which is a.e. equal to /, and is an (unqualified) function of t. Define h on T by 

Hr) = fix), l(x) = t. 

Then h* = /', and we have 

[ f dy = f f dy - f h dv, A e 5E. 

But this implies that )i(t) = E(J | r) a.e. (v), and therefore f{x) = E*(J\x) 
a.e. (g), as was to be shown. 

3. Results. For a proof of the Holder inequality that we use in establishing the 
following theorem, we refer the reader to [3] (p. 233). 

Theorem, 3 Let s ^ l. Then for almost all iy)x, 

(l) | E*(f\x)\'£E*{\f\'\x).' 

Equality holds a.e. 

(i) for s = 1, if and only if sgn / is essentially a function of t; 

(n) for s > 1, if and only if f is essentially a function of 1. 

Proof: Consider first the case s = 1. Let 

5 = {x tft| E*(f\ x) > 0), 

S' = ft - S. 

Then, for any At 5E, 

[ \EHf\ -)\dy = [ [ E*(f | •) dy 

Ja JsA J S'A 

= f fdy- f fdftg [ |/| dfi = [ E*i\f\\‘)dy. 

JSA Js'A J A J A 

3 The proof we present here was suggested by the referee, and is much shorter than 
our own. 

3 For s = 1 this inequality was used by Doob in “Regularity properties of certain families 
of chance variables”, Trans. Amer. Math. Soc , Vol. 47 (1940), pp 455-486 (Theorem 0.2). 



282 


R. W. BAR.VNKIN 


Since A is arbitrary, we have the lesult (1) with s = 1. It is clear that the equality- 
sign holds a.e. (g) if and only if, except possibly for a set of measure 0, / is positive 
on 8 and non-positive on ,8'; that is, if and only if sgn/( x) = sgn E*(J \ x) a.e. 
(p). Applying Lemma 1, we have the equality condition as stated in the theorem, 
Now let s > 1. To establish (1) it will suffice, by virtue of what has already 
been proved for s = 1, to consider / > 0 a.e. (a) We may then argue as follows. 
Unless (1) holds a.e., there in a T-.set, R, of positive measure, and numbers 
a £: 0 such that for x e R, 

[E*(f\x)Y S a, 
and 


E*(f \x)£b. 

But then, with an application of the Holder inequality we meet a contradiction. 
For, 


a[M‘ i j/ n E t (f | •) dgj' = [jj^] 

£ [ f’dp- MR))'' 1 = [ E*(f | ■) dp - MR))- 1 

Jr Jr 

^ bMR)]', 

which contradicts a > b. Thus, (1) is proved in general. 

If /(x) = E*(f | x) a.e, (p), it is readily proved by a direct argument that then 
squality holds in (1) a.e. (p). Conversely, suppose equality in (1) holds a.e. 
Then we have, m fact, a.e., 

(2) \E*(J\x)\ = E*{\f\\x), 
and 

( 3 ) m\f\\*)Y 

For brevity, denote the function B*(|/jj •) by v. Since/ vanishes at almost all 
points where v vanishes, we may write | / | = wv, where 

w(x) = I 1, = °’ 

\|/(z) \/v(x), v(x) > 0. 

(If v vanishes almost everywhere, we arc through.) For any ^-measurable, 
real-valued function, u, on SI, we have 



ON THEOREM OF BLACKWELL 


283 


when cither of these integrals exists (cf. [4], p. 50, eq. (15)). Similarly, and 
talcing account of the equality assumption (3) we have 


(5) f u ■ v dn — f 

Jn Jn 

In particular, consider the two functions 


u ■ v‘ • w‘ dfx. 


and 


Ui(x) = 


!/»(*), 

0, 


v(x) > 0, 
v{x) = 0, 


If 




3a = (* « ft 1 v(x) > 0), 

it is seen that, iq taken in conjunction with (4), and u 2 taken in conjunction with 
(5), biing out. 

/ w djj. = / w' dp. — m(S 0 ). 

From this it follows (eg, by the equality condition attending the Holder in¬ 
equality) that w(x) = 1 ae, in So Hence |/(*) | = v(x) a.e. in ft. Therefore, 
by (2), |/Cc) | = | E*(f | x) | a.e. But (2) also implies, as already shown, sgn/(sc) = 
sgn E*{f | x) a.e. Thus, finally, we have J{x) = E*(J | x) a.e. Now apply 
Lemma 2, and the proof of the theorem is complete 
Corollary 1. Lei s > 1, and let go denote the expectation of f. Then 

GO [ | E*(f | •) - go i 1 dp ^ f \f~ ff „r dn. 

Jo Jn 

Equality holds 

(i) for s = 1, if and only if sgn [/ — go] 'is essentially a function of t, 

(n) for s > 1, if and only if f is essentially a function of t. 

This result expresses the domination over the s-th absolute central moment 
of the conditional expectation of / by the corresponding moment of / itself, It 
follows almost, immediately from the theorem when we write (6) in the form 

(7) f i js'u - do i •) r & ^ [ &(\ f - a, n •) eb. 

J si Jn 

Thus, from the theorem we know that the integrand of the left-hand side of (7) 
is a e. the integrand on the right. Hence (7) holds. Equality in (7) holds then 
if and only if the integrands are ae. equal. The theorem therefore directly 
provides the equality conditions as stated. 

Let W — ,0 e 0) be a family of probability measures on 5; and t, a sufficient 



284 


ELIZABETH L. SCOTT 


statistic for W (cf. [5], p. 232, §5). Let /be an unbiased estimate of the function 
g 2 on 0 Foi each juc e W, the conditional expectation, Eo(J | ), of / with respect 
to t is defined. Since conditional expectations arc fully dctei mined by conditional 
probabilities (although, in general, not as usual integrals. Cf. [4], pp 48, 49; 
also [5], p 230) it follows from the sufficiency of t that there exists a function 
E(J | ■), on T, with Ee(f | r) = E[f | t) a.e. (v 0 ) for each 0 e © • E*(f | ■) is again 
an unbiased estimate of < 7 , and we have 

Corollary 2. Let l be a sufficient statistic for the family W — [no, 0 e 0}; 
and f, an unbiased estimate of g. For s ^ 1, and each 0 t 0, 

[ I E*(f I ■) - a(0) |' due g f \f - g(0) \‘duo ■ 

Jq Jq 


Equality holds 

(i) for s = 1, if and only if sgn [f — g( 0 )] is essentially (yo) a function of t; 

(ii) for s > 1 , if and only if f is essentially (ye) a function of t. 

REFERENCES 

[11 E W. Barankin, “Locally beat unbiased estimates”, Annals of Math Slat Vol. 20 
(1949), pp 477-501. 

[21 D Blackwell, "Conditional expectation and unbiased sequential estimation”, Annals 
of Math. Stal., Vol. 18 (1947), pp. 105-110 

[3] L. M. Graves, The Theory of Functions of Real Variables, McGraw-Hill, 194G 

[4] A Kolmoqoroee, Grundbegnjfe der Wahrscheinhchkeitsrechnung, Ergebnisse der 

Mathematik, Vol. 2 (1933). 

[5] P. R Halmob and L. J. Savage, “Application of the Radon-Nikodym Theorem to the 

theory of sufficient statistics”, Annals of Math. Slat , Vol. 20 (1949), pp. 225-241. 


NOTE ON CONSISTENT ESTIMATES OF THE LINEAR STRUCTURAL 
RELATION BETWEEN TWO VARIABLES 1 

By Elizabeth L Scott 
University of California, Berkeley 

1. Introduction. The purpose of this note is to present another case in which 
the structural linear relation between two observable random variables may be 
consistently estimated, Of the recent papers on this subject I wish to mention the 
paper by Wald [1], which contains a history of the work done on the problem, 
and the more recent paper by Housner and Brennan [ 2 ]. Also relevant is the 
important result due to Reieis 0 l [3], [4] 

2. Statement of problem. Assume that the two observable random variables 
x and y have the structure 

1 Paper prepared with partial support of the Office of Naval Research 

The results summarized were presented in a discussion held at the Cleveland Meeting 
of the Institute of Mathematical Statistics, December, 1948 



ESTIMATES OF STRUCTURAL RELYTION 


285 


\x = £ + u 

(l) 

12/ = « + /3£ + v, 

where a and (3 aie unknown parameters to be estimated, and £, u and v are 
completely independent random variables. The latter two variables, inter¬ 
preted as the random errors of measurement, are assumed to vary normally 
about zero with unknown variances al and o\ , respectively. 

An increasing number n of completely independent pairs of simultaneous 
values of x and j are to be observed 

(2) (*«, 2/0. i = 1,2, ■■■ ,n, 

so that each pair (x , , y,) corresponds to a value £, of the unobservable random 
variable £ which is independent of the value £, of £ corresponding to any other 
pair (Xj , yj), i j 

It is well known that if the distribution of £ is normal then the parameters 
a, 0, ai and <r 5 arc unidentifiable. Reiers0l proved [4] that these parameters are 
identifiable in all other cases. Wald and Housner and Brennan found consistent 
estimates of these parameters assuming that, although the particular values of 
£ are not known exactly, a certain amount of knowledge concerning the values 
of £ is available. The present note gives a method for obtaining a consistent 
estimate of 0, which is the key to the problem of estimating the four parameters, 
for the case where it is known that a specified central moment of the distribution 
of £ exists and differs from that of the normal distribution. 

Since work on this subject continues, the present brief note deals particularly 
with the simplest case, when one of the odd central moments of £ exists and 
differs from the “normal 1 ' value, zero. It will be observed that the hypotheses 
made here are of entirely different character from those adopted by other writers. 
The present note postulates knowledge concerning a moment of the distribution 
of £, whereas the papers quoted postulate some knowledge of the particular 
values assumed by £ The method adopted was suggested by a remark made by 
Neyman [5] in 1936. 

3. Preliminary theorems. Let 

(3) x = l X>,, 

71 71 

and let b be an arbitrary real number. 

Theorem 1: If ya , the third central moment of £, exists then the arithmetic 
mean 

(4) F n ,i(b) = - 2 [ 2 /* — V. ~ &(** - x)f 

n j-i 


converges m probability to 

( 6 ) 


((3 — b)V8 • 



kuzaiieth l. Hrorr 


J.Mi 


Pnoor. Simple algebra gives 

FnAb) = (<3 - b) 8 * * il - t, ( f , _ £)’ 

tl »*-*! 


+ 303 - by - Z (£. — £)*[». — v. - b(u, - it)] 
11 t«i 

+ 303 - i>) - Z (£. - £.)k - » - b(«. ~ u.)] 3 

n ,_i 

+ - Z 1". — » - b(u, - ».)] 3 . 

71 i»=»l 


It is obvious that further 
the type 

(7) 


expansion will express F n ffb) in terms of averages of 


^ZeW., 

71 I ml. 


with p + g + r ^3. Since all the terms over which each average is taken are 
completely independent, follow the same law and possess finite expectations, the 
familiar theorem of Khintchine assures that, as n is increased, each average ( 7 ) 
tends in probability to its expectation. Using Slutsky’s theorem (see Cramdr [6], 
p. 255), we conclude that F n ,i(b) tends in piobability to the limit obtained by 
replacing each average in the expansion (6) by its expectation and then letting 
n —> co. The computations are easy and give 

( 8 ) lim pF nil (b) = (0 - 5) Vs. 

n-*eo 

Q.E.D. 

Let denote a sequence of observable random variables (multivariate or 
not) such that the distribution function of X„ depends on the parameters 
0, with Oi < 0, < b, , i = 1 , 2, • • • , s. Furthermore, let X denote a real variable 
and \<Pn(X n , X)} a sequence of functions of the arguments X n and X defined for 
all possible values of X n and for all values of X within the limits cq ^ X ^ bi . 

Theorem 2: If the sequence of functions {<£„ (X„ , X)} has the following properties : 

(i) whatever be the true values &[ , 02 , o', of the parameters 0 , within the 

limits a, < 6, < b, , i = 1, 2, • • - , s, as n is increased, the sequence \<j> n {X n , X)} 
tends in probability to a function /(X, Oi) of arguments X and only. 

(ii) whatever be 5 > 0 , there exist in (a t , bi) two numbers Xj and X 2 , each differing 
from dx by less than 5 and such that the product /(Xi , 0j) /(X 2 , d[) is negative, 

, -'" or ever V n an d every possible value x„ of X n , the function rf> n (x n , X) is con¬ 

tinuous with respect to X /or a, g X § b,, 

then whatever be e > 0 and 77 > 0 there exists a number N t ,, suchihatfor n > IV,,, 
the probability that the equation 4>n{X n , X) = 0 has a root between o[ — t and 
0 i + e exceeds 1 — ij 

Proof: Let e > 0 and 17 > 0 be two arbitrarily small numbers. Let Xi and 
X 2 be two numbers such that X, e (a,, bf) and | ei - X, | < e, i = 1, 2, and such 



ESTIMATES OF STRUCTURAL RELATION 


287 


thati/(Ai , 0i) < 0 < /(X 2 , Of) Select A 7 ,,, so large that for n > N t ,, the probability 
of having simultaneously 

I <K(A„ , X,) — /(X,, fli) | < -j- |/(X,, 6i) | for * = 1, 2 

differs from unity by less than tj . It is clear that if the inequalities (9) are satisfied 
for any particular value x„ of X„ , then 

(10) <2> n (z„ , Xi) < 0 < fpn&n , \i) 

and the continuity of <fi n (x n , X) for X « (oj, bf) implies that there exists a number 
X(a:„) between X t and X 2 such that tj> n (x n ,X(x fl )) = 0 Obviously | d[ - \(xf) | < £ . 
Thus, whatever be e, tj > 0, there exists a number N tl , such that the probability 
that 4> n (X„ , X) has a root in the interval {Q'i — t, + e) exceeds 1 — i? pro¬ 
vided n > N t ,, . This proves Theorem 2. 

Theorem 2 is treated as a convenient lemma on which to base the proof of the 
existence of a consistent estimate of the parameter p in (1). It is obvious, how¬ 
ever, that this Theorem has an independent interest of its own. 

4. Consistent estimates of the structural parameter p. Referring to the general 
set-up of the problem of estimating the structural parameter (i in (1) and using 
the notation (2) and (3), we prove the following theorems. 

Theorem 3: If the third central moment of £ exists and differs from zero , then 

the equation 

(11) F n ,M = - £ fo. - y - b(x, - x.)f = 0 

n ,_i 

has a root b which is a consistent estimate of j8. 

Proof: According to Theorem 1, whatever be b and jS, the stochastic limit of 
FnAb) is (/3 — b) 3 m and changes its sign as b passes through the value p. Theorem 
2 implies then that whatever be e, t? > 0, there exists a number N t ,, such that 
for n > N,,, the probability that at least one of the roots of (11) will lie within 
P — € and P + e is greater than 1 — 77 . This proves the theorem. 

Generally, let fi m denote the m iil central moment of A 
Theorem 4: If the distribution of £ has moments up to and including order 2m + 1 
and if at least one of the first m odd central moments piic+i differs from zero, h = 
1,2, • ■ • , m, then the equation 

(12) K,,n(b) = 1 £ [ y , - y. - &(*, - x,)f m+i = 0 

n ,_i 

has a root b which is a consistent estimate of p. 

Proof: The proof of Theorem 4 exactly follows the lines of that of Theorem 3. 
Using (1), (2) and (3), we write 

F n , m (b) = £ cL + i(p - by 

fc-0 

j- £ (£. - i)‘k - V. - b(u, - M,)] 2m+1_i } ■ 

{71 ts=l j 


( 13 ) 



EL1SMHKTII L. SCOH’ 


2.S.S 

IL b easily seen that, as n -> «, l'\, n (b) lends in piobabilitv to the limit 
(14) F,,,(!>) -“ > (ft - bU(fi - h), 

Tl-*W }) 

where ^(0 - b) is a linear combination of even powei.i of (/i - b) with at least 
one coefficient different fiom zero It, follows thal the stochastic limit of F„, m (b) 
changes its sign as b passes through /i and the proof is completed by reference to 
Them em 2. 

Note that the stochastic limit of the, first derivative of /v „,(/£), evaluated at 
b = ft, is zero, which is unfortunate, Furthermore, the older of contact of F n , m {b) 
at b = j8 increases with the order of the fust odd cential moment of £ a Inch 
differs from zero. Therefore, the precision of estimating /? may he expected to be 
better when the low odd mitral moments are not zeio Without narrowing the 
generality of the case consideied, it is difficult to make an evaulation of the pre¬ 
cision of the estimates obtained. Thus, lor example, the lamiliar method of 
evaluating the asymptotic variance requhos Hie knowledge of higher moments of 
£ than those considered here For similar reasons, it is thus far impossible to 
speak of the relative efficiency of the estimates found For this purpose it would 
be necessary to deteimine fust the. measure of (lie precision of (he best estimate 
whose consistency persists independently of the distiibution of £ provided only 
that at least one odd central moment differs from zeio 
Once the consistent estimate f> of ft is obtained, there is no particular difficulty 
in obtaining consistent estimates of the other parametria 
J Neyman has pointed out [7] that Theorem 2 may be used ns the basis for 
a very elementary proof of the consistency of maximum likelihood estimates 

REFERENCES 

[1] Abraham Wald, "The filling of straight lines if bulli variables are subject to eiror," 
Annals of Math. Sial , Vol. 11, (1910), p ‘2S4 

[21 G W.Housnebani) J F Brennan, "The estimation of linear llends," .bmals of Math 

Slut., Vol 19 (1948), p 380. 

131 Olav REIERS0L, “Confluence analysis by means of lag moments ami olher methods of 
confluence analysis,” Economelrica, Vol. 9 (1911), p 1 

14) Olav Reiers0l, “Identiriabilily of a lineal lehilinn belweon variables which are sub¬ 

ject to eiror," Cowles Commission lhscvssum Papas, Skills lies, No 387 (1949) 

15] J, Neyman, Jour Itoii Slat So c, Vol 100 (1937), p 50 

[61 II CbamIsr, Mathematical Methods of Statistics, Princeton University Press, 1940. 

[7| ,1 Neyman, First (Ionise m Probability and Statistics, Vol 2, forthcoming 



MULTINOMIAL DISTHII) UTIONS 


280 


ON MULTINOMIAL DISTRIBUTION'S WITH LIMITED FREEDOM: 
A STOCHASTIC GENESIS OF PARETO’S AND PEARSON’S CURVES 


By Mama Castellani 
University of Kansas City 


1. A multinomial law with limited freedom: Distribution functions of statis¬ 
tical equilibrium. We intend to consider here a convenient model of statistical 
mechanics, which by generalization of an approach used by Cantelh [1] shall 
give us either Pareto’s or Pearson’s curves Let us imagine that N elements 
(V > /. — 3) have to be randomly distributed in a set of L continuous intervals 
L, (i = I, 2 , ■ , k) in Ri , the “a pnon” probability associated with f, , being 
Pi , for 2i-*i Vi — 1 • Assuming that the elements have no prefciences, they move 
freely under the law of chance taking diffeient configurations (re t , n> , - • , n L ), 
with probabilities P(>t i , n, , - , nf) t n, being the number of elements placed 

in f, and XI*»i a, = A r . The random variable Y(L) lepresenting the total number 
of configurations (rii , n 2 , • > • , nf), therefore obeys a multinomial law with 
k — 1 degrees of freedom, viz : 

(0 P • .»*] = N'llri p’.“, E >h = N, £ p. 1- 


■ n' 


We shall proceed to admit that the elements are not free, but that they have 
preference m the choice of a suitable interval. This fact we associate with the 
assumption that some lours of attiaetion arc made to play in each interval. 
For the sake of simplicity we shall consider that theie aie two independent 
forces, say y{t) and v(L), whose convenient potential functions aie respectively 
f(t) and <p(l), whom 


( 2 ) 



d(v0 / A 

ST ~ - r(,) 


These potential functions we may, tor instance, associate with the significance 
of a certain quanta whose tolal is to bo distributed among the elements and whose 
significance must be: established by considciation of the particular statistical 
experiment It is then admissible, at least in our fust appioach, to assume lliese 
potential quantities to have a total constant magnitude, viz , X) nJQu) — Hi , 
Z n,ip(t,) = IP. , where IP and I / 2 are appropriate constants This condition is 
analogous to the assumption in statistical mechanics of (lie pieseivation of 
energy. This analogy enables us to follow classical methods We shall call our 
method the method of “intervals of energy.” Let us say that our system reaches 
its canonic state when , n p is a maximum [2] When this state, oceuis with 

a probability close to the value one, we may say it is in statistical equilibrium. 
It is well known that P { .,, ,„*> reaches its maximum when. 

(3) 8P<, = 0 or S log P ini.n;,. ,«*) = 0 



290 


MAHIV CASTBLLA.NI 


Performing as usual, for example, as in [3], we ultimately obtain: 


n, = Np,<: 


— a— f>/(x,)—cv(*i) 


wliere a, h, and c me arbitrary constants. 

If N is sufficiently large, n,/N may be eonsideied a probability, and precisely 
the probability for Y(t) to scon* a, times the value /. when the canonical state 
is reached. The problem may then be extended assuming that a continuous 

function may interpolate the discrete, values rii, n : , • , n k . Putting y = n,/ At, 

assuming for the aakc ol simplicity that,/(t) and i pit) along with their derivatives 
are continuous functions of l, and grouping these constants into a single K, 
formula (4) becomes: 


(5) 


'/ 




then: 


(P \ d log 1 / _ 1 dy , df(t) <p(l) ,, ,, 

m ST ~ yTt 27' IT ” ~ hM - nlt) ■ 

Equation (G) is a generalization of the familiar Pearson differential equation 
which generates his system of curves. It is obvious that, ((>) may determine a 
large set of frequency curves, depending on the form of /(/) and <p{t). 

The above analysis may be extended to any number of acting foiecs piovided 
they are less than 7c — 1 in number 

2. Stochastic genesis of the Pareto and Pearson curves. We shall next show 
how the Pareto and Pearson curves belong to this family of frequency curves 

The Pearsonian system of curves is derived by comparing its differential 
equation with (6) to determine in these the most suitable functions for p(t) and 
v(t) Thus, 


(7) 


1 dy _ i! a 
y dt 0i + 0 2 f + 0 3 i 2 


—bp(t) — c.v(l) 


Corresponding to the decomposition into partial fractions of the middle term, 
we have two sets of curves. When 


(01 + 021 + 03^) = 0 3 (t — — y.,) 

and yi , 7 2 are real numbers, then 



_ Yi + a _ 

0a (Tl - 7a) (f ~ Ti) 


V_ 
i —Ti ’ 



7a_+_a_ 

— Ti) Of — 7a) 



V and q being suitable grouping constants 
Under these assumptions two forces are acting in each “class of energy”, 
each one being proportional to the distance of the interval from some origin. 



MtJLTI.VOMr VL Di.vrmnuTro.vs 


291 


Substituting ( 8 ; into (0) and integrating, we obtain corresponding to the first of 
(5), after grouping (he exponential constants into K 

V - K(t - y ,)”« - nr, 


when' K, 71 , y, , p, q also have the .significance of statistical constants according 
to which we obtain Pearson’s curves of Type 1 or VI. 

When ft — 0, we have by the same process: 


~ 6M 


— Cv(l) 


a - ft/ft _ Qi 
ft + ftjl V + t 


Hence, iiy grouping together the statistical constants under K, p, q, , q, , we 
obtain: 


V = K(p + /)'"e ,3 \ 


which is a Peaison curve of Type III . 1 In each class interval two forces are acting; 
one is constant and the other is inversely proportional to the distance of the 
interval from a fixed origin 

We obtain a Pareto curve when in ( 8 ) either p or q is zero. Under the indicated 
assumptions the Paieto income distribution curve appears in a new light. If the 
acting forces are 1 educed to one, and this one force is inversely proportional to the 
distance of the interval from some origin, the Pareto curve represents a special 
case of the Pearsoman curve 

In (7), we now consider the decomposition of the Pearson function for the case 
where Hie denominator does not have real roots This decomposition may be 
indicated as follows: 


( -|- a / T ft/2ft 

ft 4- fti + ftP ft{(l + ft/2ft)' -f- ft/ft — fi2/4fti) 

_ a — ft/2ft _ 

+ ft { it + ft/2ft) 2 + ft/ft - ft 2 /4df( ’ 


Setting 


ft_ 

2ft 


= Pi, 


a 


ft 
2 ft 


V-i, 


ft 

ft 



~bp(t) = 


J + pi _ 

ft {(1 + pi ) 2 -j- <?} 


— cp(t ) 


Vi _ 

ft [ (t + pi ) 2 + Q 


1 A L Bowley 1ms found in Ins well-known analysis of food expcndiLuies of uiban 
families, that the distribution of weekly ftunilv expenditures can be best expressed bv .1 
Pearson curve of Type III This is not surprising, since it is exactly a case wheie we can 
assume the joint effect of a constant factor and another factor acting in inveise proportion 
to the interval (again in the sense of the distance fiom a suitable origin) The constant fac¬ 
tor in our case is, the human need of food, while the factor acting inversely to the mtei vat 
can be taken as a response to prices Pee [4j 



M uu.\ c\sn;u,.VNi 


2\y> 


These we may inteipiet as fmees of Ihe New Ionian type By giouping (lie 
statistical constants appropriately under K, jh , rj, no , ;h 2 , , we derive from 

(7) the following equation. 

y - k\(i + lh? -I i/J" 1 'c m2t “ n-,,< "” l> /" ,,! . 

This is the familiar Peaison curve of Type IV. 

Other distributions of the same family can lie easily found by the .same method. 
3. The frequency curves and their statistical equilibrium. The conclusive, step 
in this analysis is m finding the probability of the most likely configuration. By 
generalizing a process of statistical mechanics fust used by C’uKtelnuovo [5], we 
assume any configuiation (iti , n 2 , • , ip.) slightly dilleient from the most 

piobable (iq ,■ , m) (the canonical configuration). Setting 

n, = a; -r ft, (i = 1, 2, • , fc), 


we have by conditions (1) and (3): 


(9) 


k It i 

= o , E «,/(0 = o , = o , 

1«=>1 1-1 1 1 

,„u •-= .v'li-^-piv 


The sum of the values of P ( „j .„o will give us the total probability of scoring 

a n, slightly different from n,. Let us designate by II the total probability of 
having satisfying all above conditions. By following t'astelnuovo’s 

method [2], [5], we obtain: 


II = E P [-fi.nb 




[nl,« 2 i 


E (, mj 


_l^«l 
|«nl 71 \ 


We determine all integral sets of n< compatible with (0); and with a condition of 
size 


1 2 

E- ! < 2 mc 

l=m\ Til 


By a well-known process [2], [5] for any u 0 


n = 



9 


u ( ‘- 6 'V 


“ du. 


This is the familiar Chi-square distribution function with (/„• — 3) degrees of 
freedom By considering u a as increasing with N, we can conclude that 


lim =’l. 

N-«a 


ihe state of maximum likelihood has a real significance only if it is almost certain 
that we will obtain either such a state or any one practically equivalent to it. 



rNUIVKSKI) tmiKCTIiR OF TESTS 


293 


This occurs when the state of maximum probability has little chance to change; 
it is a so-called stationary stair or state of statistical equilibrium It would mean a 
great deal if we could be able to say thiough how many stales the statistical 
phenomena must, pass before attaining its equilibrium, or m other words, whether 
the ergodie, hypothesis of the kinetic theory of gas applies to certain social or 
economic phenomena. We will not go further into this now, the results obtained 
here must lie considered as an initial exploratory step, which does permit us, 
however, to end with the following conclusive statement: 

If N elements, provided N is large enough, are distributed at random in 
k class “intervals of energy”, it is highly probable that they will approach 
a configuration of statistical equilibrium, a distribution of maximum prob¬ 
ability. Pareto’s and Peiusoil’s curves represent special configurations of 
statistical equilibrium in a stochastic system. 


REFERENCES 

[1J F P. Can nu.i.i, “Sulle (leduzioiu rlello lcftK 1 <h frcquciu.'i da considenuioni di probabi- 
lilit,” Mrtron, Vo I 1 (1921), N. 3 
[21 G, CM>TBi.Nuovn, Culrolo della ProbidnliUi, Roma, 1919. 

[31 R B Lindsay, Introduction to Physical tilritislics, New York, 19tl 
It) A L, Bownnv, Plane nls of Plalislics, London, 1926 

(5[ J, L Cooudi.i:, An Introduction to Mathematical Probability, Oxford, London, 1925. 


ON THE COMPLETELY UNBIASSED CHARACTER OF TESTS OF INDE¬ 
PENDENCE IN MULTIVARIATE NORMAL SYSTEMS 

By 11. D. Narain 

Indian Council of Agricultural Research 

1. Introductory. To prove the unbiassed character of likelihood ratio tests 
like the test of significance of the multiple correlation coefficient or Hotelling’s 
?' 2 test, Daly [1] used the non-null frequency distributions of these test criteiia. 
This leads to obvious difficulties when tackling the general regression problem 
and the test of independence of several sets of variates, and Daly [1] has shown 
only their locally unbiassed character. 

T’ is paper demonstrates an approach which does not require an explicit 
knowledge of the frequency distribution of the test criteria and it has been 
possible to prove that the likelihood ratio test for the general regression prob¬ 
lem and the Wilks’ criterion for independence of sets of variates are completely 
unbiassed. The argument proceeds in a chain, the unbiassedness of the Wilks 
criterion following ultimately from the unbiassedness of the t-test. The link up has 
been achieved by working with a chain of conditional distribution densities, a 
principle employed earlier by the author [3], [4] in presenting a unified distribu¬ 
tion theory of the common statistical coefficients relevant to normal theory. 



it. n. n\ u\iv 


2. The t-test- As the simplest tal urn of the prnecduie which is appli¬ 

cable generally, (’(insider the /-test fur the Mgntliranee of the mean of a normal 
population. Let the frequency function of a sample of size n lie 

(1) fSirH' exp [ 1 E fi, - n,f . 

The region IK — in complementary to the critical region ic for testing the hy¬ 
pothesis 

m - (1 

is given by 

f < lr x \ 

where h is a positive constant depending on the size of ic and 

n 

nx — E »’< i 


x 2 = E 

I 1 


Wo, writo 


-c\t 


((,l/'JV)(r-m))l j- 


f(\) d(x-), 


where 

/(x=) tltf) 

is the frequency function of x" which is distributed independently of x. To show 
that the test is completely unbiassed is equivalent to showing that 

I{rn) < 7(0) for all values of rn. 

We have 


* = r {, 

dm Jo 


,-(n/2K)(JC x +m)3 -(n(2V|(K X -w)!| ,r 2 


J(x) d(x) 


which is positive or negative according as m is negative or positive. Therefore 

I(m) < /(()). 

3. The E■ and R- tests. Let the frequency function of n observations of a 
random variate x,, be 

(3) (2*Fr«» CK„ - * £ U, - 2 l Hr I..Y1 II *,». 

*V tial \ raa l / J t 

With the usual notation for partial variates in regression analysis, the critical 
region w based on the likelihood latio test for the hypothesis 


fl dm dm-H ' dp—1 , 


m < p — If 



rXHIAKSI'.I) CHARACTER OF TESTS 


295 


ik given by 


Tv J ip (12• • ;i—1) 

1 — E ’ 2 — < a positive constant 

Xip (1* m- J j 


It can be shown [2|, j3] that this ratio can be expressed in the form 

, _ jfi _* 3 .- 

— v -1 

X 2 + L 4 

where the frequency function of x" and the z r is 

1 


(2^rr uw , “- 1 


( 4 ) 


- exp 


, /a — p + 1^ 

-., l r lx" £ ( 2 r - ip) 

( re-1 


(x 2 ) 


>/ 2’ 


rf(x 2 ) IT dSr. 


The hypothesis to be tested then becomes 

0 — TJm ” 17nt-!l 5=1 ’ 27p—1 * 

The region IF — w complementary to w is given 

£ z ‘r < kXi 


where k is a positive constant determined by the size of w. Denote by 
fX’b-i , Vp -2 , v,n) the integral of (4) over the region IF - w. Differentiating 
I with respect to jj„_i , performing the integration with respect to z P ~i and 
arguing exactly as in section 2 above we obtain 

I(Vp-l , 7J p _.j , • • , T]m ) (0, l?3)-2 > Vp- 3 ‘ ' V"0’ 

Note that z v -i is distributed independently of z f ~\ ■ Therefore starting with 
0 in (4) and considering the integration with lespect to z P -o first, we 
obtain as before 1(0, w-i 4 m) < J(0> 0> Vp-3 •* 4>») an h thus finally 

J(v,-i , Vp-i , ■ ■ nm) < /(0, 0), which proves the completely unbiassed 

character of the /i ,3 -ti*st The test of significance of the multiple correlation 
coefficient with any number of the predicting variates being fixed or random may 
be considered as a corollary to the above. We have only to multiply the frequency 
function (3) by a factor dF representing the frequency function of the random 
predicting variates (which need not be necessarily normal) This does not affect 
either the test criterion or the arguments showing its unbiassedness The test of 
significance of the multiple correlation coefficient is thus completely unbiassed. 



II. It. \ \I( US' 


4. The general regression problem, (liven I he disliilml ion, 

( 2 ^-l.n , , M , j - » 22 } E <S.r “ E lir> 

T P I ». 

■ - E dri •r,/l! X IT rft.r, 

/ ur 

(5) i * 1, 2, ■ • • /i, 

// -- 1, 2. • f, l + 1, l + 2, • • m, 

r, s « in -I- 1, w + 2, • • • p, 
n > p > m > /, 

where the matrix || :r, h ]j is of rank m The hypothesis // to he tested is 

? =- m 1, in + 2, • • • p, 

(ir, « 0, 

V = / + 1, / + 2, • • ■ m 

The likelihood ratio test gives the critical region defined by 


X = 1- < a positive constant. 

| ^rt J 

where, with Lire usual legirssion notation for partial vanities, 


<12* (12. m) > 

i^l 

rv 

flrr = E^ir.Ili !]ir.a 02. -I) • 

1*»1 


r, s = m -f 1, m + 2, ■ ■ ■ p, 


Now we note that 


x = n a - ED - a - 2^) n a - #>, 

r=*m4-t fsain-f-L 


E U+l.i+2, m.mt-1. r-1) 

1 - El = 1=1-— - ~~ - . 

it 

^ -v *l-ir>(12> ■ l.m-H,»/(j-2, .r~l) 

t« 1 

Since the statistic X is invariant to linear transformations of the random vauales 
> ^mH 2 , ■ * 7 x p the distribution (5) may lie simplified to 

n (2rV r r ln,l> exp F-™ E (*ir - E dr/, x ih y XI dx ir , 

^ r=m+.l L L <2 t t . h J , 

1 = 1,2 , ■ n, 

h = 1, 2, • • TO. 

Denote by I(p pv , , • j3* +liT ) the integral o£ (7) over the region T7 — w 



rvnru'.-.iii) ciruiAOTiiu of tests 


297 


complementary to the critical region u\ where d™ in 7 stands for the entire set 
of parameters dr./ h , dr ./,2 , • ■ ■ d r „, . We may first integrate over a subregion of 
W — w over which J 1)’/, 1 (l — El) has a given value. Using identity (6) and 
the result of section it it follows immediately that 

7(dpi) i dp 1 f ‘ dnifl t’) ^ 7(0, dp—l.v , dp—2,ti | dm-H.v) • 

If dpu ~ 0, the distribution of E), is independent of that of Efoi • Hence, startng 
with fipv = 0 in (7) and considering the integration for Z5p_i first, we obtain 

7(0, dp—l.t) j ftp-2,ii , ' ‘ * dm+ l,v) ^ 7(0, 0, dp—2,1) l ' * diP+l.p) ' 

Thus finally 

7(dp®, dp-i.p , • • • dnt+i.p) < 7(0, 0, ■ • ■ 0), 
which proves the completely unbiassed character of the test. 


6. Test of independence of sets of variates. Consider n observations of q sets 
of random variates distributed in the multivariate normal form 

Const X exp [-1 for - m r )(x,. - m,)}] II dx ir , 

% *.r 


(8) 7 I) 2, ■ • • a, 

r = 1, 2, • • • li, li •+■ 1, fi + 2, • • • U , h + lj • ‘ fo • ■ lq, 

n > lq. 


Denote by Dj the dctciininant of the sample dispersion matrix of the/ 11 set of 
variates and by D(j) the determinant of the dispeision matrix of the first j 
sets taken together. The Wilks’ statistic used for testing the independence of the 
q sets is given by 


(9) 


7)(g) 

II 77 ; 


9—1 

J~2 


where 


Dij) 

’’ D,D(j - 1 )’ 


1 = 2,3, 


The region W - w complementary to the critical region w is defined by 


<!■ 


A > a positive constant 


The statistic W is invariant to linear trunsfoimations within each set of variates. 
The distribution (8) may therefore without loss of generality be written in the 
form 


do) n 


Lr-l,_l+l 


il (2*7;) 


2n-(«/2) 



1,-1 


( ) 2 )n 

/ t.r 


dx , 



29B 


ft. It. M1TH 


Let, B, (j = 2, 3, ■ ■ q) Maud for I lie set of constants 

r = Vi + 1, lj-i+ 2, • • • l,, 

Pr "’ ,, _ 1 o . 7 

u 1, 1 j 

and let 

(11) B, =- 0 

imply the vanishing of nil ihe conslanls of the set B, . The q sols of variates 
will be independent if (11) holds for all values of j from 2 to q. Denote by 
](B a , B „.1 , ■ , li--) the mtogiul of (10) over the region IT — w. Integrating 

first, over the sub-region of IT — ie for which 

<i-i 

II X, 

y- 2 

has a given value and vising the result of section 4, it follows that 
I{B q ,B q -y ,•••«:)< 7(0, /V> . • ’ /V 
Also if B g = 0, X, ih distributed independently of X,_j . Hence starting with 
B,j = 0 in (10) and integrating for X,,. \ first, we obtain 

J(0, Bg , ZV,, • ■ B,) < 7(0, 0, /*,.* , • • ■ /?=). 

Thus finally, /(/>’„ , B q -, , ■ Bg) < 7(0, 0, ■ ■ • 0), which proves the completely 
unbiassed characlei of the Wilks miction. 


REFERENCES 

U] J E. Dai.v, “On the unbiased character of likelihood ratio icbIs for independence in 
normal system Aimalt o/ Math Slat , Vol 11 (1910), p. 1 
12 ] P C Tang, “The power function of analysis of variance tests with tallies and illustra¬ 
tions of Choir use,” <Xiu< lies Mcvi.f Yol 2 (19381, p 120 

(3) It D Naiiain, “A new apptoacli to .sampling distribui.uiu of Che mulCivauato normal 

theory. Part I,” Jour I nil. Hoc. Ayr Slot., V ol 1 (19181, p 59. 

[4] It D Naiiain, “A new appioach to sampling distributions of the mulCivauato noimal 

tlieoiy P.ut 11,” ./oiti I ml. Hue Agi Slat , Vo! 1 (19481, p 137, 


ON THE DISTRIBUTION OF THE TWO CLOSEST AMONG A SET OF 

THREE OBSERVATIONS 1 

By G. H. Seth 

Iowa Stale College 

1. Introduction. In tins note we obtain the joint distribution of the two closest 
observation x', x" (x' < %") of the set it , x 2 , .r 3 (xi< ,x 2 < x 3 ) when the dis¬ 
tribution of , xg , x 3 is given or can be obtained " We will assume that m general 
th e density fu nction is given by f(x, , r 2 , x 2 ) end that it is continuous m the 

1 The results in this paper weio presented at a mooting of 1 l.e Institute of Mathematical 
Statistics m Madison, Wisconsin, September 9, 1948 

1 The author’s attention was drawn to this problem while visiting the National Bureau 
of 41 and,ads m the Spnng of 1948, by Mr Julius Lieblein of the Statistical Engineering 



V V.. 1 I'LOSKhT OIISKRV ATION'S 


299 


vanables involved We also find the distributions of certain statistics depending 
on x f and x". Wo will denole the density and the* cumulative distribution function 
of a normal variate with mean zero and unit variance by <j>(x) and G(x) 

2. Distribution of the two closest. Let x', x" be the two eloscst among the set 
of Ti , .(2 , -i'a (-ri< X; < x,i) Let P(> S'i , <SL , • - , >S\) denote the probability that 
the events <S'l , »Sb , • , <s' ; occur. Let us consider P(x' < s, x" < f), for t < s. 

For .s < /, it reduces to IPs" < t) i.e. the marginal cumulative distribution of 
x" 

Now 

P(x' < S, x" < t) — P{xi < S, X> < t, X 2 — Xi < Xi — xi) 

-f- P(x 2 < s, x 3 < t, x 3 — Xi < xj — Xi). 

The equalities, here as well as elsewhere, are omitted as the variables admit 
continuous distributions. Let the first and second terms on the right side in 

(1) be denoted by P(A) and P(B) respectively, where A, B denote the events in 
the respective brackets The event B can be further split up into more ele¬ 
mentary events whose probabilities can be easily found (B) can be seen to be 
equivalent to 

(xi < 2s — l, h < x> < , 

+ (2s — t < 3"i < S, Xi < Xo < s , 

+ (xi < - L < *2 < s, 

We may write (1) in the form of integrals and dilierentiatmg under the integral 
sign with respect to t and s we obtain 

(2) = f jU,t,Xi)dXi+ f f(zis,t)dx i 

dtiOL J21—J •'-*> 

The light hand side of (2) gives the density function of x', x" at x' = s, %" = t. 
Let j XJ (x x , Cj) be the density function of x t and x } (t > j = 1, 2, 3) Then the 
density function p(x', x") of x' and %" can be put into the form 

p(x', r") - f n (x r , x") [l - P-,i(2x" - x' I Xi = x', Xi = x")\ 

^ + / 2J (.i/, x") [Fi(2x' - x" ] ai 2 = a/, a 4 = x")], 

where | x } — l, x k = m) represents the cumulative distribution function 

of the conditional density function of x t when x 3 and x k aie fixed at the values l 
and in respectively If, before ordeung, the three obseivalions are independent 

Laboratory He understands that Mr Lie blent has in preparation for submission to the 
Journal of Research oj the National Bui can of Standards a paper giving intensive considera¬ 
tion to the closest pan and other aspects of samples of three obsei vations 


Xi < Xj < 2x2 — xij 

Xi < x 3 < 2 x 2 — Xi) 
\ 

X > < X,\ < t 



it w/m 


and from Ihe samp population haring the density function f(x), then (3) with 
the help of 

JU i , -Tj, r,j) - H/(.r,) /(V?) /()-,) 

reduces to 

(4) p(x\ x ") = 6/(j'\A.r" '[l - F(2x” - x'l + F(2x' - r")] 


whore Fix) = I /(.r) <h. 
j— 00 


3. Joint distribution of (x" —x') and (x" — x')/(xj — xjh Let /‘\(s, i) denote 
the cumulative distribution function of u - x" — s' and w — . Then 

Xii — .t i 

(5) = 2> a" - r' < a, < t . 

L xj — .ii 

The range for u is (0, eo) and iy varies between 0 and .1, and thus we limit our¬ 
selves to s varying from 0 to «, and t varying in (0, \). 

After some manipulation of the probability statement, and differentiating with 
respect to s and t under the integral sign, in a manner similar to that of the 
previous section, wc obtain the joint density function of u and v, given by 


cfF](s, f) sf f“ / . s\ , 

-55T ■■pU.V' n + * ,I, + <7‘ in 




= /l (s, 0 (say). 


4. Applications to normal distributions. Let f(ot) in (4) be the density function 
of a normal distribution with mean 6 and variance, uni ty, then (6) reduces to 

(7) «„,»>- 0-, 

ttid 2 3u>“ 

Further the marginal density of u and w will be give.n by 

(8) P W-0V2*(^)[l -«(v|)]. 

(9) p(w) = --3——- 0 < w < respectively 

The distribution of w has been obtained by J. Licblein in an unpublished 
paper 

From (2) we can also obtain the joint density function of u = x" — a/ and 



ERRATA 


301 


-4- £ f/ ( 

v = - - „ When wc integrate this joint density function with respect to u, 

a -/ + x" 

we obtain the density function of v = -- h— as given by 

£ 


p(v) = Ov^lV^O’ — 0)1 


( 10 ) 


i +1; 


/V2(r -_0)\ 

V vTi / 

— 2 [ (frix'jG 

JQ 


Sx 

7/| + t> - 0) cto 


V3 

The mean and the variance of the distribution of v are given by 6 and | + - 7 — 

4tt 

respectively. 

It may be remarked that if there is a suspicion that one of the extreme observa¬ 
tions m a sample of three does not belong to the normal population under con¬ 
sideration, then the median of the sample is a better estimate than the average of 
the two closest. The efficiency of the latter compared to that of the former is 

V3 

about 70%, for the variance of the median in tins case is given by 1 + — 

7T 

a/3 

compared to \ -4- -; - of v, the average of the two closest. The efficiency is here 

*j7T 

defined as the ratio of the variances for the two estimates. 


ERRATA 
By W. Feller 
Cornell University 

The author regrets the following inconsequential, but very disturbing, slips 
in his paper “On the Kolmogorov-Smirnov limit theorems for empirical distri¬ 
butions” ( Annals of Math Stat , Vol. 19 (1948), pp. 177-189): 

(1) In equation (1 4) on p 178, the exponent — vz should be replaced by 
—2 v l z . The same copying error occurs in the description of Smirnov’s table on 
p. 279. The proof is coircct as it stands. 

(2) In the formulation of the continuity-theorem on p. 180 it is claimed that 
Un. —> f(t) whereas in reality the continuity theorem permits only the conclusion 
that 

k ft 

(*) 5 X) U T -* / fix) dx. 

T= 1 JO 

This slip m formulation in no way affects the proofs since only (*) is used. 
(The assertion that the step functions {£*,} converge pomtwise is not based on a 



aoa 


ABSTRACTS OF PAPERS 


second application of the, continuity theorem, but on the obvious fact that(+) 
implies 



I qlr)J(x) (hr, 


where the step function b/ r j converges uniforrnily to a continuous monotonie 
3(*))> 

The following corrections apply to the paper, “On the normal approximation 
to the binomial distribution" (Annals nj Math, Hint , Vol. lb, (1045), pp 319- 
329). 

(1) Equation (27) gives two variants of an estimate, for (he error p. The second 
should simply restate the, first one in terms of the variable x, in other woids, 
the expression {p + q s ) in tho second line of (27) should bo replaced by 
p 3 (l - pai/o -) -3 -f r/(l + qxja) 3 . 

(2) The estimate p < tr' °/300 given in (28) is not valid over the entire range 
for which it is claimed. However, the further theory depends only on the fact 
that p = 0(<T 4 ), and the estimate, p < <r‘*/30 is both correct and sufficient for 
our purposes. (Actually, no changes whatever are required m the proofs, since 
(28) is used explicitly only for a range where it. is correct as stated). 

(3) On p. 324 it. is stated that under the conditions of the main theorem 
(p. 325) k > 4, n — k > 4, whereas m reality the value. 3 can occur in extreme 
cases. Fortunately, the assertion is not used anywhere in the proof, and the 
error p is negligible in all cases. 

Accordingly, no changes are required either in the formulation or the. prooE of 
the theorems 1 am indebted to Dr \V, Hooffding for calling my attention to the 
slips. 

(4) The first ramus sign in footnote 5 should be an equality sign and the second 
minus in (70) a plus. 


ABSTRACTS OF PAPERS 

{Abstracts of papers presented at the Chapel Hill meeting of the Institute, March 17-18, 1950) 

!■ A Method of Estimating the Parameters of an Autoregressive Time Series. 
S. G, Giiurye, University of North Carolina. 

The general autoregressive process of tho second ordor is defined by the equations 

3U 03 A i -b 171 , 

Xt 4- aiYi-i + aiXt-i = , 

■where x, is the value actually observed at time l, Xi the corresponding theoretical value, 
f ‘ the d 'sturbanoc and t,, the superposed variation. The estimates of ai , ai given by Yule’s 
met od are biased and inconsistent if yi is not identically zero, the permanent bias being a 
unc ion of the unknown variance of y, The present paper proposes a method of estimation 



ABSTRACTS OF PAPERS 


303 


which is unaffected by the pieseucc of r,, , ami seems to be better than any other known 
method; and this conjecture is supported by the results of application to observational and 
artificial scries In this method the estimates a, , a 2 are obtained by minimizing 

" 1 (w-i 

*“3 A/).— *ij ^ Xl 1 (k <**®uaH*i4* + GiXi+i-i + aiXi+k-?) 

whine n is some number small in eoinpanson with N (vvlneh is the number of observa¬ 
tions). In Ibu above expicshitm the usual appioximatmn of substituting (Af — fc — 2)u foi 
-‘^ 3 - t,t, a may be made foi e.ompulatumal eonvemeneo The method has licen used foi 
fitting autoregirssive juocesses to the senes of annual aveiages of Wolfei’s sunspot num¬ 
bers and that of M.vrdaPs .Swedish cost of living index numbers The method is applicable 
to higher order processes 

2. Most Powerful Rank Order Tests. (Preliminary Report). Wassily Hoeffding, 
University of North Carolina 

Let A’u , • ■ , A'lm , • , Xki, • , Xju* be landom vanables with a joint probability 

function P(S) and let = A",»| = 0 if g ^ h (i = 1, , k). Let H 0 be a hypothesis 

wlncli implies LlmL P(/S') is invariant under all peimutations of AT,i, . , A,„, (i = 1, • , 

l) Let 7„ 0 = 1, • • , n.) lie tlio nmhs of A'u , ■ • , A', n , . Under Th the M = Tin,' rank 
permutations R = (rn , , r\ n , i , ni , , U nl ) have the same probability P(R) — 

A/ -1 . A test which depends only on the permutations R is called a rank order test (11 O.T ), 
A It O.T. of size m/M winch is moat poweifu! (M.P.) against a simple alternative, T \{&), 
is deteunmed by m peimutalioiut It for winch P,(R) takes on its m largest values. 

For example, let the pairs (A'i , Ui , ••• , ( X n , F„) bo independent and identically 
distributed. Let Ho state that A', , F, are independent, and let Hi(p) be the hypothesis that 
A, , F t have a bivariate normal distribution with correlation p. We may assume that 
A'i < • < X n and consider the ranks r, of the F’s only AH.O.T. which is uniformlyM P. 

against all Ilt(p) with p > 0 docs not exist except for small n TheM PROT against small 
p > 0 is determined by the laigest values of 2C_i {EZC) (EZ ri ), where EZ, is the expectation 
of the z-tli order statistic in a sample of n from a standard normal distribuion The M P, 
unbiased R.O.T against small values of Jpjis based on the statistic 2, {EZ^Z,)(EZ r xZ TI ) 
The M.P. II.0 T against p close to 1 is obtained by expanding the probability of (r ; , , 

r„) in powers of ((1 — p)/{ 1 + p) | 1,! 

3. The Comparison of Percentages in Matched Samples. William G. Cochran, 
Johns Iloplcins Univeisity. 

In this paper the familiar x s test for comparing the percentages of successes in a numbei of 
independent samples is extended to the situation in which each member of any sample is 
matched in some way with a member of every other sample This problem has been encoun¬ 
tered m the Helds of psychology, pharmacology, baetouology, and sample suivey design 
A solution has been given by McNomar (1919) when there aic only two samples 

In the more general case, the data aro arrangod in a two-way table with r rows and l 
columns, in which each column represents a sample and each row a matched gioup The 
tost crilciion proposed is 

c(c - 1 )S(T, - TP 
® ~ ~~ c(Su.) - (S ul) ’ 

where T, is the total number ot successes in the j lh sample and the total number ot suc¬ 
cesses in the i tb row If the tiue probability of success is the same in all samples, the limit- 



30 1 


AUbTUVCTh OF PAPEUK 


irig distribution of Q, when (lie number of rm\s m large, is the x ! distribution with (c — 1) 
degrees of freedom. The relation between this test and the ordinary x 2 test, valid when 
samples are independent, is dismissed 

In small samples the e\art distribution of Q can be constructed by regarding the row 
totals as iixed, and by assuming that on the null hypothesis every column ib equally likely 
to obtain one of the successes m a row This exact distribution is worked out for eight 
examples in order to test the accuracy of the x 5 approximation to the. distribution of Q in 
small samples. The number of samples ranged from c — 3 to c » The average on or in the 
estimation of a significance probability was about 11 per cent in the neighborhood of the 
5 per cent level and about 21 per cent m the neighborhood or the 1 per cent level Correction 
for continuity did not improve (lie accuracy of the approximation, although it is recom¬ 
mended when there are only two samples. Another approximation, obtained by scoring each 
success as “1” and each failure as “0” and pcrfoiming an analysis of variance on the data, 
was also investigated. The JP-lcst, corrected for continuity, performed about as well as the 
x 2 approximation (uncorrected), but is slightly more laborious 

The problem of subdividing x ! into components for more detailed tests is brielly dis¬ 
cussed. 

4. A Method of Estimating Components of Variance in Disproportionate Num¬ 
bers. H. L Lucas, North Carolina State College. 

By including sufficient effects in the forward solution of the Abbieviated Doolittle 
method, components of vanance may he estimated from dispropoilionate data. The pro- 
ceduie is vciy systematic, and thus, is adaptable to routine computational work. Thu 
computations will be described, and the utility of the method bimfly discussed. 

5. On the Theory of Unbiased Tests of Simple Statistical Hypotheses Specifying 
the Values of Two Parameters. (Preliminary Report). Stanley L. Isaacson, 
Columbia University. 

In the Neyman-Pearson theory of testing simple hypotheses, in the one-parameter ease, 
alocally best unbiased region is called “type A " It is obtained by maximizing the curvature 
of the power curve at the point B = 9 0 specified by the hypothesis, subject to the conditions 
of size and unbiasedness For the two-parameter case, Neyman and Pearson considered 
“type C” regions {Stat, Res Mem., vol. 2 (1938), p 36), The definition of these regions 
requires one to choose in advance a family of ellipses of constant power in an infinitesimal 
neighborhood of the point (0 , , 0») = (6° , 0°) specified by the hypothesis The natuial 
generalization of a “type A” legion is a “type D“ region, which maximizes the Gaussian 
curvature of the power surface at, (flj , 0 j), subject to the conditions of size and unbiased¬ 
ness. This definition does not require one to choose a family of ellipses in advance This 
approach leads to a new problem in the calculus of variations A sufficient condition is 
obtained which plays the role of the Neyman-Pearson fundamental lemma m the “type A” 
case An illustrative example is given. (Prepared under sponsorship of the Office of Naval 
Research ) 

6 A Note on Orthogonal Arrays. Raj Chandra Bose, University of North 
Carolina. 

Consider a matrix A - (a,,) with N rows and m columns, each element a,, standing for 
one of the s integers 0,1,2, , s — 1 Let us take the partial matrix obtained by choosing 

any t < ?n columns of A Each row now consists of an oidered i-plet of numbers, and each 



ABSTRACTS OF PAPERS 


305 


element lias one of s possible values, there aie s‘ possible Z-pIets The matrix A may be 
called an 01 tliogonal array ( N,m, s, t ) of size N, m constraints, s levels and strength t , if by 
choosing any i columns whatsoever every possible Z-plet occurs the same number of times 
Clearly N = \s‘ where X is an integer Such arrays have been considered by llao and are 
useful for vanous experimental designs The existence of an orthogonal array (s 2 M, s, 2) is 
equivalent to the existence of a set of orthogonal Latin squares of side s and m constraints 
(i e , the number of Latin squaies in the set is m — 2) The fundamental question that can 
be asked regarding orthogonal arrays is the following’ What is the maximum number of 
constraints for an orthogonal array, given JV, s and t? Denote this number by f(N, s, t), 
then from known properties of Latin squares /(«*, s, 2) = s +1, if s is a prime or a prime 
power, and a theorem by Mann states that f(s \ s, 2) = , + 1, if s = p? 1 ■.. pj fc , where 
pi, • , pk are different primes, and i is the minimum of pi 0 , p, 1 . The following 

generalisation of Mann’s theorem is proved m this note 

f(NiNi ■ N k ,sis» s k , i) = Min|/(.Vi ,sit)J{N 2 , si,i), , f(N k , s k , t )) , 

7 Transformations Related to the Angular and the Square Root. Murray F. 
Freeman and John W Tukey, Princeton University 

The use of transfonnations to stabilize the variance of binomial or Poisson data is 
familiar (Anscombe, Bartlett, Curtiss, Eisenhart). The comparison of transformed binomial 
or Poisson data with percentage points of the normal distribution to make approximate 
significance tests or to set approximate confidence intervals is less familial Mosteller and 
Tukey have iccently made a graphical application of a transformation related to the 
square-root transformation for such purposes, where the use of "binomial probability 
paper” avoids all computation Wc report here on an empirical study of a number of ap¬ 
proximations, some intended for significance and confidence work, and others for variance 
stabilization. (Prcpaied in connection with research sponsored by the Office of Naval 
Research) 

8 Standard Inverse Matrices for Fitting Polynomials. F. J. Verlinden, North 
Carolina State College. 

For fitting polynomials of the type, y = box 0 + hx + b 2 x 2 + • + b,„ s'", with the %’s 

equally spaced, published tables of orthogonal polynomials may be used This procedure 
does not yield the b’s directly, nor their variances or covariances, although such may be 
obtained by proper computations which are moderately tedious. Insome types of statistical 
work, the b’s and then variances and covauances may be desired. These may of course be 
obtained directly by the method of least squares but the computational work is prodigious 
relative to that for the orthogonal polynomial approach When the x’s are equally spaced 
the elements o,f the vananee-covaiiance matiix may be put in the simple form of sums of 
poweis (including the zero power) of successive integers fiom zero to n (n equals one less 
than the numbei of observations) The elements of the inverses of matrices of tlus type 
have been worked out algebraically m terms of n for polynomials up to and including the 
quintic (m = 5). With these standard mveise matrices, the b’s and then vananees and co- 
variances may quickly bo obtained once the elements are evaluated numerically These 
elements have been evaluated numerically up to n = 20. 

9. Mathematical Models in Biology. J. A. Rafferty, Department of Biometrics, 
School of Aviation Medicine, Randolph Field, Texas. 

From the point of view of a bio-medical icsearch administrator, mathematical models 



AhSTUVGTS OF PAVERS 


30(1 

will assume !i greater inle in biological researeh Minn heretofore. In anticipation of thia 
trend, eerlain philosophical miplicatiori.s of models in luologieal theory and scientific theory 
in history are examined A hierarchy of nbstriieiion-levels in biology is delineated, and the 
lolc of mathematical models at these levels is illustrated liv examples fumi the literature 
Proposals are made for a concentration of mathematical effort on ceitain important bio¬ 
logical problems, llemaihs arc made on the capabilities and limitations of models m biology. 

10. Small Sample Performance of Biological Statistics. Irwin Brush, Johns 
Hopkins University. 

In this paper the dilution method for estimating bacterial density is investigated by an 
exact small sample method and also hy an appioMinalc one. Methodologies ami design of 
experiments me compared for various small sample eases 

11, Methodology in the Study of Physical Measurements of SchooL Children. 
B, G. Greenberg and A. Hughes Bryan, University of North Carolina. 

In a series of investigations to detci mine by small-sampling technique what physical 
differences, if any, occur between children of differing socio-economic backgrounds, several 
problems of methodology aiosc. A pilot study was undertaken to nsBuie maximum ctlieiency 
at each step This paper reports some of these results.} It was found that the children could 
remain dressed (with the exception of boys' bi-iliac measurement) without changing the 
magnitude of tlic differences The pilot study enabled us to decide how many observers to 
use, and how much duplication of measurements by them was necessary Minimum sample 
sizes were estimated to indicate physical differences of predetermined magnitudes. It was 
found that the age grouping 96-143 months was optimal fiom tile standpoint of indicating 
physical differences between children of differing socio-economic levels. Boys and girls in 
the upper socio-economic levels were both taller and heavier for their age in this age group. 
There were no weight differences, however, when weight was adjusted for age and height. 
Measurement of the bi-iliac and transverse chest diameter provided little additional in¬ 
formation on physical differences. The calf circumference, an indicator of muscle mass and 
subcutaneous fat, is suggested as being a sensitive supplementary index to indicate physi¬ 
cal differences when age and height are adjusted. 

12 Tetrad Analysis in Yeast. A. S. Householder, Oak Ridge National labora¬ 
tory, Oak Ridge, Tennessee. 

In nourospora all four products of meiosis are recovered m the four spores of an ascus 
In crosses AB X ab the asci are of three types, designated I, II or III according as all four, 
none, or two spores resemble parents Frequencies of these types, P, l v and P" are the 
observables If there were no exchange P" would be zero; and one should have P' = 0 
or \ according to whether the loci wore on the same or different chromosomes. 

Assuming only that no exchange occurs between sister chromatids and neglecting eluo- 
matid interference, one can calculate without further assumptions a frequency P" of 
exchanges between, a single locus and its centromere from data on three or more genes taken 
in pairs by equations 

*•) = so.so, , P" = 2(1 - s)/3, 

where the subscript 0 refers to a centromere Lindegren makes such calculations fiom his 
own data, by taking groups of three, but makes no effort to reconcile discrepancies Ney- 
man’s modified chi-square, however, permits combining all observations in a set of equa- 



ABSTRACTS OF PAPERS 


307 


tions Hiat yields easily to rapidly converging iterative solution. The equations are 


2s, 2 sj(n„ + <,^(> 1 -' + n’-') = 2 a,(n lt +n\,H2n7l-n\ 7 1 ), 

jr* iyi{ 


Tvhcie a,, is tile number in class I and II combined for the loci i and j, the number 
class in III, and only those paiia (i, j) are included which arc found to be independent 
The argument of A li. (1. Owen (/'roc. Roy. Roc , Scr B, Vol 136 (1949) pp 67-94 ) 
can be paraphrased for the piesent ease and a suitable generating function P(\, u) is being 
sought providing a inelrie. The specific one proposed by Owen is ruled out Bince s = 
P{~\, u) lakes on a negative value for one locus, which is not possible with Owen’s function. 


13. Contribution to the Probabilistic Theory of Neural Nets. I. Randomization 
of Refractory Periods and of Stimulus Intervals. Anatol Rapoport, University 
of Chicago. 

Aggregates of neurons are considered in which the frequency of occurrence of neurons 
with a specified value of the refractory period follows certain probability distributions 
Input-output functions are derived from such aggregates In particular, if input and output 
intensities arc defined in lernis of stimulus frequencies and firing frequencies per neuron 
respectively, it is shown that a reef angular distribution of refractory periods leads to a 
logarithmic input-output curve. If input and output are defined in terms of the total 
iiumboi of stimuli and firings in the aggregate, it is shown how the “mobilization” picture 
leads to the logarithmic input-output curve. 

By randomizing (lie intervals between stimuli received by a single neuron and by intro¬ 
ducing an inhibitory neuron a voiy simple "filter not” can be constructed whose output 
will be sensitive to a particular range of the input, and this range can bo made arbitrarily 
small. 


14. Theoretical and Experimental Aspects in the Removal of Air-Borne Matter 
by the Human Respiratory Tract. II. D. Landahl, University of Chicago 

The pnncipnl faetois governing the fate of a particle in the respiratory tract are impac¬ 
tion due to inertia, settling due to gravity and Brownian movements. For a given respira¬ 
tory pattern, it is possible to calculate the probable fate of a particle from a knowledge of 
the geometry of the passages. These calculations have been carried out in such a mannei as 
to obtain the theoretical amounts of material deposited m various regions of the lungs as 
well as the relative amounts in various fractions of the expired air Similarly, it is possible 
to estimate the probable fate of a particle which passes through the nasal passages. Ex¬ 
periments have been carried out to verify a number of these predictions On the whole, the 
agreement, as illustrated m the slides, is fairly satisfactory when one considers the com¬ 
plexity of the calculations. 

15. An Application of Biometrics to Zoological Classification. F. M. Wadley, 
Navy Department, Washington, ,D. C. 

Statistical problems in taxonomy arc discussed, attention must be paid to variation of 
individuals as well as of group means. Covariance analysis and the discriminant function 
technique are applied to multiple measurements in groups of molluscan fossils 

16. The Analysis of Hemotological Effects of Chronic Low-Level Radiation. 
Jack Moshman, United States Atomic Energy Commission, Oak Ridge, Ten¬ 
nessee. 



30S 


ABSTRACTS OF PAPERS 


Roveial methods live invcsfigaled fui analyzing (he possible effects of chronic low-level 
iriadiatiou upon the employees ol the npeialmg contiaetois of the US AEG. The effects 
investigated me those on the rod Mood count, hemoglobin, white Mood count, lymphocytes 
and neutrophils. The analysis includes measurements of si gm (leant dilfei cuces among 
lndmdualb, geographic sites and the explmntion of vauous indices of exposuic to ladiation. 
A non-paeametnc ilcteimination uf tiend values foi individuals which may be applied 
to muss data is considered 

17. Statistical Problems in Psychological Testing. Edward E. Cureton, Uni¬ 
versity of Tennessee 

Though great piogress has been made in mathematical statistics in recent years, a numbei 
of the major statistical pioblcms encountered in the development anil use of psychological 
tests lemam unsolved Borne of these problems are outlined, with paiLiculai reference to 
the mathematical models and assumptions implied by psychological theory, by the nature 
of the experimental data, and by the conditions under which the lesulls and findings are 
to be applied. 


18. Accuracy of a Linear Prediction Equation in a New Sample. George E- 
Nicholson, Jr., University of North Carolina. 


The problem considered is as follows. Given two samples 5i and St of Ah and Ah observa¬ 
tions on a p + 1 chataoter random variable {y, Xi • • x P ) ■ Let lh and Y t be the linear regres¬ 
sion equation computed by the method of least squares from each sample The effect of 

using Yi to predict the y'a in 5a is considered. The ratio k s used as a measure 

>S(2/a - Yd 

of the predicting efficiency of IT in Si relative to F s when the X, are fixed for the usual 
regression model The general multivariate case is also considered 


19. Independence of Quadratic Forms in Normally Correlated Variables. Yuki- 
yosi Kawada, Tokyo University of Literature and Science, Tokyo, Japan. 

An extension is given of theoicms of Ciaig, Hotelling and Matdrn which includes the 
following theorem, proved by a new method If two quadiatic forms Qi , Qi m noimally and 
independently distiibuted variates with zero means and unit variances satisfy the four 
conditions B{Q\Q\) = E{Q\)E(Q\), for i, j = 1, 2, then the product of the matrices of the 
two forms in either order is zero 


20. Bounds on the Distribution of Chi-square. S. A. Vora, University of North 
Carolina. 

Let 

k h 

X 2 = 2 (y, - npJVnp,, x' 2 = 2 (v, + $ - Np^/Np,, 

h K 

where v, > 0, 2 v, = n, p, > 0, 2 p, = 1 and A r = n + k/2. Bounds on the multinomial 
piobability T m terms of x' 2 are obtained A tnangulai tiansformation of 
■ti = (i'i + I — A7uV{A>,(l — 7**) * 1,2 


(i = 1, • , k - 1) 



ABSTRACTS OF I’A PIOUS 


309 


to y, is applied so that 


/ -1 



whoie d ia determined later by equaling the coefficients of x’ 5 Certain ie,ctangles ? (w) 
with (i/i , ■ ■ , in i) as a mid-point are non-overlapping anil cover the entire space fib -i 
for v, — 0, ifcl, -hll, • . If x’ s I; r, then bounds on T m terms of the integral of Llio (/c —1) 
rlimensional normal frequency function ovei the. rectangle r (e) arc obtained Prob (x' 2 < c) 
is the sum of T ovet x' 1 Is c < ho the integral ovei the sum of rectangles whose mid-points 
lie within the hypersphere x' 2 < c is considered. Two hyperephercs, one which contains the 
sum of those rectangles, and one which is contained in it are used for the bounds, giving 

Xu Fk-i(cJ < l’rob |x' 2 < c) < X,-f\_,(ci), 

where Pr_i(n) is a clu-square distribution function with (k — 1) degrees of fieedom and 
X t , X 2 , cl , a are functions of c, n, k and pi , • , pr . As n —> “ , both bounds tend to 

/'V-i(c). Hounds of the same form arc obtained for Prob (x 5 < C| Closer bounds 
for Prob.(x 2 < C\ are given in terms of a non-central chi-squaie distribution 

21. Estimation of Genetic Parameters. C. R. Henderson, Cornell University. 


Many applications of genetics and statistics to the improvement of plants and animals 
deal with expcnmenlal data for which the underlying model is assumed to be 

v 

y* = 2 b, x lo + 2 Mi z.« + c„, 

•-1 i-i 


where 6, aic unknown fixed parameters, a\« and are observable parameteis, the w, are 
a random sample fiom a multivariate normal distribution with means zero and covariance 
matrix || <r,, ||, and the c a arc normally and independently distributed with means zero 
and variances <r 2 . If a,, =0 when i ^ j and if <r 2 = a\ , the model is the one usually as¬ 
sumed when components of variance arc estimated 

Three different estimation problems are involved, (1) estimation of 6, under the assump¬ 
tions of tho model, (2) estimation of w, and (3) estimation of <r,, . The first two problems 
are not solved satisfactorily by the least squares procedure m which the m, are regarded 
as fixed, but the maximum likelihood solution does lead to a satisfactory estima¬ 
tion procedure. 

Assuming that the <r w and <r 2 are known, the joint maximum likelihood estimates of 
b. and w, are the solution to the set of linear equations 


n 

v 

V — 1 


p Q 

2 i), (2 Xhn .Tio/cr 2 ) + 2 U, plt,z»/r;) = 

i — l a t -l a 

b, (2 X,*Zha/<rl) + 2 M,(o a -)- 2 Z lo Z*a/cr;) 

Of i«*l a 


%ha y<xla t 

a 


— 2 Z/| a1)a/<F ai 

a 


h = 1, ■ , v, 

h = 1, • ,q. 


Some important applications of this estimation pioeedure to genetic studies aie described 
and certain computational short-cuts are suggested 

The problem of estimating <r,, has not been solved satisfactory although under certain 
quite general assumptions the equations for the joint estimation of b, , u , , a,, , and <r„ 
can easily be written The solution to the equations, however, is too difficult to make the 
pioeedure piaetical Nevertheless unbiased estimates of <r,, can lie obtained by equating 
to their expected values the differences between certain reductions in sums of squares 
computed by least squares and solving for the a t] In gencial, the expectation of the reduc¬ 
tion due to 6i , , b p , w, , . u 4 (fc < is 2 2 d*E{Y s Yi) , whcie d"'‘ are the elements 



310 


ABKTHAOTti OF PAPERS 


of the mati ix which is the inverse nf the (p »- A) 5 matrix of coefficients and the V a arc the 
imht. members of the leaal squares equal ions 

22. Estimating the Mean and Standard Deviation of Normal Populations from 
Double Truncated Samples. A. C, Coiien, Jit., University of Georgia. 

The method of maximum likelihood is employed to obtain estimates of the mean and 
standard deviation of a nonnally distributed population from double truncated nunlom 
samples Two eases are considered In the fust, the nuuibei of missing variates i.s assumed 
to be unknown In the second, the. number of missing (unmeasured) variates m eaeh tail is 
known Variances for the estimates involved in each case aie obtained from (be nia\iiiiuiu 
likelihood infoimalioii matuces. A numerical example is given to illustrate the pi Helical 
application of the estimating equations obtained fm each of the two eases consideied 

23. Minimax Estimates of Location and Scale Parameters. Gopinatii Kalli- 
anpur, University of North Carolina. 


If the joint fr f of the random variables A'i, , Xn contains only a scale p.uametcr 

and is of the form 


1 /ti r v \ 

v P 1 “ i •• I. 

a" \u « / 


then uinlei mild restrictions the following theorem is proved - 

{ (V — <?\ 

TimoiiKM 1 If the loss function os of the form IK 1 - 1 > t> 

iioCO of « minimizes 


the best or minimax estimate. 


>(:• ■ •?) 


ii r t. a anti further, 

aiilixi , , pin) = guoGi , ■ , zn), a > 0 

When both location and scale parameters arc present and the joint lr I is of the form 


1 (n-0 xy - B\ 

— p I -- , ■ , - ) , 

a \ a oi / 


(umlei conditions similar to those in Theorem 1 ) we obtain two results for the estimation 
ol 0 and a, lespeclively, one of which is 


Theorem 2 If the loss function is of the form W 




the best estimate flo(x) of 0 minimizes 


r r »■(•—• ...^w 

J- K J 0 \ or / a N \ a a / 

, „ (li + A tjv + 6n(X( , 

and 0 0 I — — , ■ ■ , —-—I =-- 


, Xv) + k 


Those theorems have been applied to dciive minimax estimates in the case of slatidatd 
distributions Finally, the problem of estimating the diffeienee between the location 
parameters of two populations is briefly considered The icsuKs obtained in tins jiajicr are 
a continuation of the line of approach suggested m Them cm o of Wald’s, "Contributions 



ABSTRACTS OF PAPERS 


311 


to the Thorny of Statistical Estimation and Testing Hypotheses ” (Annals of Math Slat , 
Vol, 10 (1930), pp 299-225) (The present work was earned out under Office of Naval Re¬ 
search contract ) 

24. On Some Features of the Neyman-Pearson and the Wald Theories of Statis¬ 
tical Inference, Their Interrelations and Their Bearing on Some Usual Problems 
of Statistical Inference. S. N. Hoy, University of North Carolina 

With two alternative hypotheses //, and //. it is shown that (i) the mosl powerful test 
of II i with inspect to IIi is automatically an unbiased test in the sense that its power is 
never less than (and usually greater than) the level of significance a and (n) there is also 
a least poweiful test with its powei nol greater (usually less) than re This means that all 
tests have powers lying m between, which gives a complete picture of the possible family 
of tests and provides a basis for defining efficiency of tests. 

With the first kind of emu a is tied up a minimum second kind of 6noi p (comple¬ 
mentary to the maximum power P), and the level at which a is fixed depends upon some 
compromise between a and p. This intuitive approach is formalised by the introduction of 
loss functions related to and apriori probability weights for II i and U 2 , thus leading to 
the first stage in the Wald treatment of dichotomy with two solutions in the obseivation 
space corresponding icspoctively to minimum and maximum total risks This is imme¬ 
diately gcnei alised to the first stage in the Wald treatment of multichotomy with minimum 
atid maximum total risk solutions. An important special case is discussed in which all the 
possible alternatives to a particulai hypothesis aie, by our test procedure, indistinguish¬ 
able among themselves, thus effectively foiming only one alternative to the hypothesis, 
which moans a degeneiate multichotomy The beanng of this on most powerful tests on 
an average under the Neyman-Pearson theory is also discussed 

The problem of testing a composite hypothesis which is usually treated in teims of the 
Neyman-Pearson theory is posed and treated in terms of the (first stage) Wald theory and 
an indication is given of how those notions could be applied to the usual problems of uni¬ 
variate and multivariate analysis 

25 Note on Uniformly Best Unbiased Estimates. R. C Davis, Naval Ordnance 
Test Station, Inyokern, California. 


For the estimation in an absolutely continuous probability distribution of an unknown 
parameter which does not possess a sufficient statistic, it is shown that no unbiased esti¬ 
mate for the unknown parameter exists which attains minimum variance uniformly over 
a parametei set of arbitiary nature This result demonstiates the impossibility of obtain- 
mg a generalized sufficient statistic first proposed by Bhattacharyya Although not used 
in this note it is suimiscd that Barankm’s powerful results on locally best unbiased esti¬ 
mates can be applied to yield further results in this dncction. 

26. Competitive Estimation. Herbert Robbins, University of North Carolina. 

Lei 9 be a veclor random variable with distribution function (7(9) and let z be a vector 
random variable whoso frequency function/(z; 9) depends on 9 Two statisticians, A and B, 
are lequired to estimate 9 from the value of x If A’s estimate is closer to 9 be wins one 
dollar from B, and vice versa; in case of a tie no money changes hands It is shown that A 
should estimate 6 by the function a{x) = median of posterior distribution of 9 given x, 
his expected gam will then be >0 whatevei estimate B may use. If 0(9) is not known to A 
he should estimate it from the scries of values of 9 which have been observed in previous 



AHSTltVCTH OF PAPERS 


n 

u 


12 


Inals, If these aie not known, A .should estimate 0(0) from the values of x which have 
pieviously occuned, how this may be done is discussed elsewhere (see Abstract 35) 

71-0111 the point of view of the thorny of names, when (7(0) is unknown wo have a name m 
which the ‘‘rules” ai-e unknown and must he successively estimated from past experience. 
Other ex'ainples arise whenever a name involves landom devices whose probability dis¬ 
tributions are, not known to the players but must, be. inferred by statistical methods, in 
general front secondaiy variables which contain only part of the total information The 
role of statistical inference in such ‘‘long term” games is fundamental 

27 The Effect of an Unknown ‘Location Disturbance’ on “Student’s” t based 
on a Linear Regression Model. Uttam Chand, Boston University 


Consider y i, , j/wj , j/Wi+i , • • • , a set of observations ordered in. time If the 

y's are normally and independently distributed according to N(a + /3(J — l), a 1 ) and we 
want to find out if the y 'S have changed with time, we usually employ a “Student's” i type 
of statistic with N — 2 degrees of freedom, If, as a consequence of the impact of a certain 
unknown political or economic change in Lhc past, on the y's, the y's actually constitute 
two independent, normal samples y i , yx, , , •■ yji distributed according to 

N(m i, o- 2 ), N{m.i , a 1 ) respectively, a two-sample “Student’s" t also based on N — 2 degrees 
of fieedom would be the appropriate statistic to use for the hypothesis mi = m 2 . If, in 
fact, the latter situation describes the correct, state of admit,, and the statistician employs 
the “Student’s” l based on the lcgression model, lie commits an error The present papei 
investigates the nature of such an crior m the light of the point of impact as determined 
by the magnitude of jVj and the intensity of the impact as determined by the standaidized 


‘distance’ 


tions y 



of this ext i ancons ‘shock’ on the ordeied set of observa- 


28 Corrections for Non-normality for the Two-sample t and the F Distributions 
Valid for High Significance Levels. Ralph A. Bradley, McGill University. 

The effects of non-noimalily of the paient population on common tests of significance 
have long been of concern m the application of statistical methods to experimental data 
In this paper, the two-sample f-statistic is expressed as a simple multiple of the cotangent 
of an angle between two lines m a space of dimensionality one less than the total of the 
sample sizes, the ^-statistic for h samples is expressed as a multiple of the cotangent of 
an angle between a line and a plane of (fc — 1) dimensions m a space, again, of dimension¬ 
ality one less than the total of the sample sizes. The geometrical formulation is such as to 
suggest approximations to the distnbulions of these statistics valid for large values of 
the statistics, and these approximations aic obtained. The approximations are shown to be 
exact in the special cases where the paient population is normal, and a method of evalua¬ 
tion of correction factors is given for a wide class of paient populations The approximation 
procedures arc valid for the distributions under both null and non-null hypotheses 

29 Some Tests Based on the Empirical Distribution Function. (Preliminary 
Report). James F, Hannan, University of North Carolina, 

Let X = (Xi , Xi , • , X n ) be an independent sample of n where X , has the continu¬ 

ous cdf. Fix) Let S„(x) be the empirical distribution function Acceptance regions of 



ABSTRACTS OF PAPERS 


313 


the type |J'jS»(a) <4>(x) for all x] are considered for diffeient specifications of and their 
probabilities evaluated The method of evaluation consists in identifying the regions with 
regions defined in terms of the order statistics of a sample of n from the uniform distribu¬ 
tion on the lnlcival (0, 1) The result obtained for <j>(x) = F(x) + c/n, 0 < c, integral <n 
is used to provide a direct proof of the Kolmogoiofif result 

lim P[n I/2 sup (5„(a) — F(x)) < z] = 1 — e -2 ' 2 , 

* 

while that obtained for = F(x) + t, 0 < t < 1, gives the exact c.d f. of the statistic 
sup, (F„(x) — F(x)). 

30. On a Generalization of the Behrens-Fisher Problem. (By Title). John E. 
Walsh, Rand Corporation, Santa Monica, California. 

Let m + n independent observations be available where it is only known that a specified 
m of them arc from continuous symmetrical populations with common median n while the 
remaining n are from continuous symmetrical populations with common median v. This is 
the generalization of the Behrens-Fisher problem investigated; some tests and confidence 
intervals for p — v which are valid for the generalized situation are presented For definite¬ 
ness, suppose that n < m The procedure used is to subdivide the m observations (common 
median m) into n groups of nearly equal size and form the mean of the observations for 
each group. Pair the n means with remaining n observations and subtract the value of 
each observation fiom the value of the mean with which it is paired. The resulting n values 
represent independent observations from populations with common median m — ” Tests 
and confidence intervals for ^ - v are obtained by applying the lesults of “Applications 
of Some Significance Tests for the Median Which are Valid Under Very General Condi¬ 
tions” {Jour. Amer. Slat. Assn., Vol 44 (1949), pp. 342-55) to these n values To measuie 
the “information” lost by using the generalized tests when one actually has two inde¬ 
pendent samples from normal populations, power efficiencies are computed with respect 
to, (a) Scheffd’s “best” i-test solution and (b) most powerful solution when latio of vari¬ 
ances is known. Case (a) yields an upper bound while case (b) furnishes a lower bound 
for the actual efficiency. 

31. Construction of Partially Balanced Designs with two Accuracies. (By Title). 
S. S. Shrikhande, University of North Carolina and Nagpur College, Nagpui, 
India. 

Various methods of construction of partially balanced designs fiist introduced by Bose 
and Nair ( Sankhya , Vol 4 (1939), pp. 337-373) have been considered. Two of the methods 
given aie generalisations of a difference theorem given by them. Another method is the 
mveision of an unreduced balanced incomplete block design with h - 2. Use has also been 
made of the existing balanced incomplete block design m anotliei dnaction A number oi 
designs can also be obtained by methods of finite geometries and especially by omitting a 
number of treatments and certain blocks from the complete lattice designs Use of curves 
and surfaces m finite geometries and the use of multifactonal designs given y acre 
and Burmau (. Biomeinka , Vol. 33 (1946), pp 305-325) are also indicated 

32 Designs for Two-way Elimination of Heterogeneity. (By Title), S. S 
Shrikhande, University of North Carolina and Nagpur College, Nagpur, 
India. 

Use has been made of the existing balanced and some partially balanced designs for two- 



314 


ABSTRACTS OF PAPERS 


\\ftj elumnniLm of hot ciogcueitv villi !it. mu,,t. Ivvu uocuiacies I’aiticulav cases of these 
designs wen' given liy Vouilrn (Conti ihntinir: from Itoyi e Thumiismi Institute, Vol 9 (1937), 
pp, 317-3'ifi) and Hose and Kishen (termin’ unit Cultuie (1WJ, pp. 11(1-137) The method 
depends upon interchanging the positions of vaiinus tieafnienls in the dilTeiont, columns 
(blocks), if necessaij, no ns to ,satisfy eeilain couditioii,s. 

33. Designs for Animal Feeding Experiments. (T>y Title). S >S. Siiiiikiivnde, 
TJniversily of North Curoliua and Nugpur (htllcgn, Nagpur, India 

In animal-feeding evpeniminl.s chunge-oiei dedgns me geiiemllv prefeinble to continu¬ 
ous feeding evpeiinienls In eluuige-ovei designs both Hie diieel mid oaiiy-ovoi lieatment 
effects aie impoilnnt Use of balanced and partially balanced incomplete block designs 
towaid this end has been consultsed 

34. A Truncated Sequential Procedure for Interval Estimation, with Applications 
to the Poisson and Negative Binomial Distributions. (Preliminary Report). 
(By Title), D Martin Kandebius, University of Uppsala, Sweden, and Unt- 
veisity of Washington 

Let x , i/ v , j/ 5 , ■ tie a sequence of random vatiubles defined m (0, »), and let n be the 

Ml 

smallest integer satisfying 2 y, > tx, wheio l > 0 is a non-random quantity Define Uk 

k v “ l it 

eithci as 2 yjx or ns the smallest intogoi eveeeding 2 y t /x ) L = 1 , 2 , •• , Given the 

i—i i—i 

distnbution function I‘'(x., 0) of r and, for any l, the conditional distubution of n with 
respect to j, the distribution of iq is ohtaiued Tim problem is to (letuiuuue a conlidcnoe 
interval for 0 with eontidene.e, coefficient 1 — a oil the basis of either an observation on 
tq, if i a < t, oi an observation on «, if »i < k — 1 The following procedure is pioposed. 

If iq < t, choose 0m and 0 U accouhng to a rule satisfying Prob (0m < 0 < On) | iq < l) > 
l—a If n <. k — 1, choose 0 m and <? 2 i such that Prob (tbo < 0 < On \ n < k — 1) > 1 — « 
Foi continuous lit, the following cases are discussed, a) x = 0 with piobabiliLy 1, and n 
has, for any t, a Poisson distribution with mean 10, b) x has a Gamma distribution with 
mean 0, and the conditional distribution of n with inspect to x is, for any f, a Poisson dis- 
tuhulion Both cases may, for instance, lie applied to bacteiinl counting, 

35. A Generalization of the Method of Maximum Likelihood: Estimating a 
Mixing Distribution. (Preliminary Report). (By Title). Herbert Robbins, Uni¬ 
versity of North Carolina 

Lei 0 be a vector landoin variable with distribution function 0(0) belonging to some 
rhios 6), let r be a vector landom vauablo whose frequency function f(x; 0) depends on 0, 
ami let </ k (c) = ! f(x, 0) dO(0) be the resulting fiequcney function of j- Prom a sample 
11 , Tc , iL is lequued to estimate (7(0) The generalised method of maximum likeli¬ 
hood consists m using the estimates G„(0; x, , , j;„) m if) for which If (U(r,) is a innu- 

nuini Under ceitain lesliictions this method is consistent as n -> ■/;, 

Any consistent method of estimating the mixing distiibul,um 0(0) bum the sequence 
xi , u , • ■ yields a solution of paiametric statistical decision problems in the following 
mannei from past values x, , ■ , x„_i we estimate G'(0), and thou use the coiresponding 
Bayes solution of the decision problem to reaeh our decision for x„ , even though the value 
0n winch produced x n is different from those which pioducod a, , • , x„-i In certain cases 

of long-tcnn experimentation this approach seems moie reasonable than the minimax 
method which decides on the course of action appropriate to 0„ on the basis of x n only, 



NEWS ANP ‘ N ' 1 


and ignores the 

in ^"l i i Tfn—l « 


information about thf* l ,rl0t 


UfflCf# 

ilittnl'iH 


mu 


,f A 


{ k 


.- 


36. Smallest Average Confidence Sets Ior *" Univer^Hv <'* N 
Normal Means. (By Title). Raghtj Raj P a!IA . ..1 ■ » W, 1 


.. 


X 


Let v = (th , , zim ; ■ • , »i . > 7 ,n|iiilA ti <' n " H1 ' 

samples of sizes n\ , n 2 , , nj. from norm# 1 I ... 

ir, having mean a, and variance o-“ Writing/*^ ^ ' Bll y pariund rr * ^ j, 


,,^i. HrrK'l" 


JCiudidcan space of all points ju by R- ^ lV , rn 


— (<n 


a k ), and any sot-valued funfi ! fP rtam . „ 

(•which BAtlS 1 "* , falilC. ll'ld fd < V 

/(») 


jj,r i dm*. 
A * 

„ ,1-1 r 


% it* 1 ?*' 

? nm > ? 


i»vr 


having subsets of R as its values (which 3,1 ,/ %)»j^inp 




( Jrd 


ah |u 

«(/ | m, <r) = probability of the statement ‘V (| j- constructing /** p follunc * hv* n 
Lebrhguo measure of J(v). We consulci tlmpro ^^ |[ fi oh tail*[«/(»? * * wh^re 
^ and /3 "as small as possible” One of ^ ^ ** 




ir*’ nub' 


{•(;» 

p. o < p < i, ict /" f („)(»> = Im‘s* 1,7-• ,1>> o*«*” v . 
f. = fir's," 1 rr,, , S” = nr's;* 1 (*„ - S,) s , A f , x f v _ k l - ?’■ y Si n. I. ThM ' 
constants, and f(p) being detormined by P(xi > ^ ( ,f ficcdom (" ' L m,) js any other 
pendent ehi-squarc variables with k, N — t d l 'K‘' q < c < lin ‘* ^ \o ,(t>) differ by 
(a) obviously rr(/x f(p ) I m, e\) = p for all/* and « c ’ oithcr (,•) /(a) I ji, cXl 

function such that «(/ | m, c\) < p for all g and * ^ | ^ c ^)| > sup 

a set of measure zero for almost every », or (**) 

for every a. 


NEWS AND 


NOTICES 


items 


0 j general Merest 


, lhe Institute news 
Headers are invited to submit to the Secretary oj 

Personal Ite^ Branch, Spccuil 

, f ihe Special Fro] crV jser of the 

Mr I-Iarry II. Goode, formerly head of y or ] Cj is now b 1 j^j^gan, 
Device Center, Office of Naval Research L Q cn ter, Universi y 
Aero-Physics Group, Aeronautical Reseaic 1 j 0 hns Flopkms 

Ann Arbor, Michigan „jy employed by the• aB Mat he- 

Mi. William G. Howard, who was previo ^ presently oin P ^ £ oJ1 g ress , 
University, Institute for Cooperative Bese arC ^ ^ Library 0 jn l.he 

matical Slatistinan m the Air Studies a position ° s a peS ota Division 

Miss Margaret Kampschaefer has accoP ^ 0 n project, ^^jyjjior Mathe- 
U. S. Bureau of Labor Statistics, Minnesota _ v iy empl°y e( ^ ^ S , pivision, Chi- 
of I employment and Security She was Naval Reac 0 

matician at, the Argonne National Labors 0 
cago, Illinois. 


. o{ Ad*" 111 Motl,e ‘ 

-.- Prole*® “ 

Dr Albert Noack has recently been apP° bability 

matics at the University of Koeln, Germany’ ,- fics and P {0 

,^beniati cal St , +lf . a i Laboratory, 

Second Berkeley Symposium on Math ^ t be S talj£ ^ l0 50 with the 

wiB be, : *f 3l to August 12 . 


The Second Berkeley Symposium 
Truvors!ty of California, Berkeley, from 


July 



UG 


NEWS AND NOTICES 


cooperation of the American Statistical Association (Biometries Section), the 
Biomctne Society (Western North American Region), the Econometric Society, 
jhc Institute of Mathematical Statistics, the Institute of Transportation and 
Traffic Engineering (UC), and the Office of Naval Research. 

The Symposium will include sessions on mathematical statistics, probability, 
biometrics, econometrics, traffic engineering, astronomy, and physics. The com¬ 
plete program may be obtained from the Statistical Laboratory. The papeis will 
be published by the University of California Press as the Proceedings of The 
Second Symposium. 

Cumulative Index of Volumes 1-20 

Attention is called to the fact that there is now available a cumulative index 
for Volumes 1 through 20 (1930-1949) of the Annals of Mathematical Statistics. 
Copies may be secured from the office of the Secretary-Treasurer for $1 00 per 
copy 

i New Members 

The following persons have been elected to membership in Ihc Institute 
(December 1, 1949 to February 28, 1950) 

Bain, John C., BA, (Univ of Toronto), President’s Statistician, Abitibi Power 
& Paper Company, Ltd., 408 University Avenue, Toronto 2, Ontario, 
Canada. 

Blake more, George J., Jr., A.B. (George. Wash Univ.), Student at George Wash¬ 
ington University, 17/j8 Ifobait St., NW, Washington 10, D C. 

Bross, Irwin, D, J., Ph.D, (North Carolina State College), Research Associate, 
Department of Biostatistics, School of Public Health, The Johns Hopkins 
University, G15 North Wolfe Street, Baltimore 5, Maryland. 

Cansado Maceda, Enrique, Ph.D. (University of Madrid), Assistant Professor 
of Mathematical Statistics, Faculty of Sciences, University of Madrid and 
Official of the National Institute of Statistics, Pasco da Rosales, 50 Madrid, 
Spam. 

Clatworthy, Willard H., M A (Umv of Kentucky), Student at the University 
of North Carolina, Box 1G8, Chapel Hill, North Carolina 
Dinsmore, Robert J., A B (Univ. of Calif.), Student at the University of Cali¬ 
fornia, Berkeley, California, %4%8 Mzlvia St., Berkeley 4, California. 

Enell, John W., Eng. Sc.D (New York Univ.), Assistant Professor of Adminis¬ 
trative Engineering, New York University, 71 Ayers Court, West Englewood, 
New Jersey. 

Flores, Anna M., M.Sc (Umv. of Mexico), Mathematician, Torres Adalid 
§511, Mexico City. 

Gamer, Norman R., B.A (Univ of Rochester), Graduate Student at Univer¬ 
sity of North Carolina, 15 Goldslon Ave , Carrboro, North Carolina 
Hannan, James F., M. A (Harvard), Research Assistant, Department of Mathe¬ 
matical Statistics, University of North Caiolma, P 0 Box 168, Chapel 
Hill, North Carolina. 



REPORT OP CHAPEL IIILL MEETING 


317 


Klein, Joseph, B.S. (Rutgers), Graduate Student at Rutgers University, PO 
Box 501, Red Bank, New Jersey. 

Lewis, Evan J„ Ph.D. (Cornell Univ), Physicist, Corning Glass Works, Corning, 
New York 

Palekar, Madhukar N., B.S. (Bombay), Graduate Student in Department of 
Mathematical Statistics and Departmental Assistant, 108 Furnald Hall, 
Columbia University, New York 27, New York. 

Page, Woodrow W., M.A. (Oklahoma Umv.), Graduate Student, University of 
North Carolina, 24-1 Jaclcson Circle , Chapel Hill, North Carolina. 

Pretorius, S. J., Ph.D (Umv. of London), Professor of Statistics, University of 
Stellenbosch, Soeteweide, Stellenbosch, Union of South Africa 

Price, Don C., M.A. (Kent State Univ ), Student, Department of Mathematical 
Statistics, University of North Carolina, 1621 Shorb Ave , N W , Canton S, 
Ohio 

Scalora, Frank S., A.B. (Harvard), Assistant in Mathematics, 106 Mathematics 
Building, University of Illinois, Uibana, Illinois 

Somerville, Paul N., B Sc. (Alberta, Canada), Graduate Student in Department 
of Mathematical Statistics, University of Noith Caiolina, 316-B Dormitory, 
Chapel Hill, Noilh Carolina. 

Sirken, Monroe G., M A. (Univ of Calif at L. A), Research Associate, Labora¬ 
tory of Statistical Research, Department of Mathematics, University of 
Washington, Seattle, Washington 

Steam, Joseph L., M S. (College of N. Y), Mathematician, U S. Coast & 
Geodetic Survey, Department of Commerce, Washington, D. C. 

Whelan, Walter J., M A. (Boston Umv), Student, Department of Mathematical 
Statistics, Columbia Umveisity, New York, 119 Wilmington Ave., Dor¬ 
chester 24 , Massachusetts 

Wile, Janet L., A.B (Univ of Rochester), Statistician, Department of Defense, 
Army and Transpoitation Corps, # 156, 1813 Queens Lane, Arlington, 
Virginia. 

Wilhelmsen, Lars, Aktuarlcandidat (Oslo Univ.), Actuary, Storebrand, Boks 
425, Oslo, Norway. 


REPORT OF THE CHAPEL HILL MEETING OF THE INSTITUTE 

The forty-second meeting of the Institute of Mathematical Statistics was 
held jointly with the Biometric Society (Eastern North American Region) at 
the Chapel Hill campus of the University of North Carolina on Friday, March 
17, and Saturday, March 18, 1950 One hundred twenty-one persons legisteied, 
including the following members of the Institute - 

R L Anderson, T W Anderson, Geoffrey Beall, C A Bennett, Mis C A. Bennett, Nils 
Blomqvist, R C Bose, R A Bradley, Irwin Bross, Glen Burrows, L D Calvin, Tjtt.ini 
Chand, W. G Cochran, A C Cohen, Jr , W S Connor, Jr , P. P Crump, E E Cureton, 
R C Davis, W L Decmer, T G Donnelly, Churchill Eisenhart, J W. Fertig, S Cl. 
Ghurye, Leon Gilford, B. C Greenberg, F E Grubbs, Max Halpenn, J F Hannan, Boyd 
Harshbarger, C R Henderson, Wassily Hocffdmg, Harold Hotelling, A S. Householder, 



318 


IlEFOUT OF CIIAPEL HILL MEETING 


W G. Howard, S L Isaacson, A. W. Kimball, Jr , B l' 1 Kimball, Maiguoilie Lehr, Guido 
Iiiscrre, Eugene Lukacs, Cl L. Marks, H. A Meyei, Paul Minion, 1) J Monmv, Jack 
Moshman, G M. Motley, M. L. Norden, II. W Norton, Ingram Olkiu, Paul Peach, J A 
■Rafferty, Wyman Richardson, Jr , Herbert Robbins, H N Roy, S A Schmitt, It K. Her¬ 
ding, D. II Shepard,? N Summerville, E. W Stacy, J W Tukey, U E Vo law, Jr , 
T, M. Wadley, M. A. Woodbury, Marvin Zelen. 

Professor R. L. Anderson presided at the opening session for contributed 
papers on Friday morning. The following papers were presented 

1 A Method of Estimating the Parameters of an Autoregressive Tune Penes. Mr S Cl 
Ghurye, University of NoiLh Carolina. 

2 Most Powerful Rank Order Tests Piofessoi Wassily Hoeffdmg, University of North 
Carolina, 

3, The Comparison of Percentages m Matched Samples. Professoi W G. Cochran, Johns 
Hqpkms UmvcisiLy 

4, A Method of Estimating Components of Variance m Dispiojiorlionatc Numbers Pro¬ 
fessor II L Lucas, Noilh Carolina Stale College 

5 On the Theory of Unbiased Tests of Simple Statistical Hypotheses Specifying the Values 
of Two Parameters. Mr S L. Isaacson, Columbia Uruveisity 
0 A Note on Orthogonal Arrays Professor It CJ Bose, UniveisiLy of North Caiolina. 

7 Transformations Related to the Angular and the Square. Root Mr. M E. Freeman 
and Professor J W Tukey, Piniceton University 

8 Standard Inverse Matrices for Pitting Polynomials. Mr E J. Verlmden, North Caio- 
hna State College 

On Friday afternoon Dr. James A. Rafferty, School of Aviation Medicine, 
Randolph Field, Texas, gave an invited address on Mathematical Models in 
Biology. Professor Gertrude M. Cox then presided at a session foi contributed 
papers, at which the following papers were piesented: 

1 Small Sample Peiformance of Biological Statistics. Mr Irwin Bross, Johns Ilopkins 
University 

2 Methodology in the Study of Physical Measurements of School Children. Piolessor 
B G Greenberg and Piofcssor A. II. Ilryan, Umveisity of North Carolina 

3 Tetrad Analysis m Yeast Dr A S Householder, Oak Ridge NaLional Laboiatory 
1 Contribution to the Probabilistic Theory of Neural Nets I Randomization of Itefi no¬ 
tary Periods and of Stimulus Intervals. Professoi Anatol llapopoit, University of 
Chicago 

5. Theoretical and Experimental Aspects m the Removal of Airborne Matter by the Human 
Respiratory Trad Professoi II D, Landahl, University of Chicago, (Read by Pro¬ 
fessoi Rapopovt.) 

0 An Application of Biometrics to Zoological Classifications Di F M Wadley, Navy 
Department, Washington, D. C 

7 The Analysis of Ilcmolological Effects of Chronic Low-level Radiation Mr. Jack Mosli- 
man, United States Atomic Encigy Commission, Oak Ridge, Tennessee 

A joint dinner of the two sponsoring organizations was held at the Carolina 
Inn on Friday evening, wLtli an attendance of sixty-two Professor W G. Cochran 
as toastmaster introduced Chancellor R B House of the University of North 
Carolina who welcomed the gathering with words and music. Professor Gertrude 
M. Cox responded for the Biometric Society and Professor D F Votaw for the 
Institute 

Professor Harold Hotelling presided at a Saturday morning symposium on 



REPORT OP CHAPEL HILL MEETING 


319 


multivariate analysis Professor E E. Cureton of the University of Tennessee 
gave the opening address on Statistical Problems in Psychological Testing. After 
a lively discussion the following contributed papers were presented: 

1, Accuracy of a Linear Prediction Liquation in a New Sample Professor George E 
Nicholson, Jr , University of North Carolina 
2 Independence of Quadratic Poms in Normally Correlated Variables Professor Vuki- 
yosi Kawada, Tokyo University of Literature and Science, Tokyo, Japan. (Read 
by the chairman) 

3. Rounds on the Distribution of Chi-square. Mr. S. A. Vora, University of North 
Carolina. 

This was followed by a Biometric Society address by Professor C R. Henderson 
of Cornell University on Estimation of Genetic Parameters. 

Professor W. G. Cochran presided at the final session for contributed papers 
on Saturday afternoon The following papers were presented: 

1 Estimating the Mean and Standard Deviation of Normal Populations from Double 
Truncated Samples Professor A. C Cohen, Jr , University of Georgia 

2 Mmimax Estimates of Location and Scale Parameters. Mi. Gopmath Kalhanpui, 
University of North Caiohna. 

3 On Some Features of the Neyman-Pearson and Wald Theories of Statistical Inference, 
Their Interrelations and Rearing on Some Usual Problems of Statistical Inference. 
Professor S, N Roy, University of North Carolina 

4. Note on Uniformly Best Unbiased Estimates Mr. R C Davis, Naval Ordnance Test 
Station, Inyokern, Calif 

5. Competitive Estimation Professor Herbert Robbins, University of North Carolina 
Q. The Effect of an Unknown ‘Location Disturbance’ on “Student's” t Based on a Linear 

Regression Model Professor Uttam Chand, Boston University 

7 Corrections for Non-normality for the Two-sample l and F distributions Valid for 
Ihgh Significance Levels Piofessor Ralph A Biadley, McGill University 

8 Some Tests Based on the Empirical Distribution Function Mr J F Hannan, Uni¬ 
versity of North Carolina. 

9. On a Generalization of the Behrens-Fisher Problem. (By title). Dr John E Walsh, 
Rand Corporation, Santa Monica, Calif 

10 Construction of Partially Balanced Designs with Two Accuracies. (By title) Mr. 
S. S Shrikhande, University of North Carolina and Nagpur College, Nagpur, India 

11 Designs for Two-way Elimination of Heterogeneity. (By title). Mr. S. S. Shrikhande 

12 Designs for Animal Feeding Experiments (By title) Mr S. S Shrikhande 

13. A Truncated Sequential Procedure for Interval Estimation, with Applications to the 
Poisson and Negative Binomial Distributions (By title) Mr D. Martin Sandelius, 
University of Washington and Uppsala University, Uppbala, Sweden 
14 A Generalization of the Method of Maximum Likelihood Estimating a Mixing Dis¬ 
tribution (By title) Professor Herbert Robbins, Umveisity of Noith Carolina 
15. Smallest Average Confidence Sets for the Simultaneous Estimation of k Normal 
Means. (By title) Mi Ragliu Raj Bahadur, Univeisity of Noith Carolina 

f 

About eighty-five members of the two organizations attended a tea given by 
Professor and Mrs. Hotelling at the conclusion of the Saturday afternoon 
session. 


Herbert Robbins 

Assistant Secretary 




FUNDAMENTAL LIMIT THEOREMS OF PROBABILITY THEORY 1 

By M. LoivvE 2 

University of California, Berkeley 

no sooner is Proteus caught 
than he changes his shape 

1. Introduction, The fundamental limit theorems of Probability theory may 
be classified into two groups. One group deals with the problem of limit laws 
of sequences of sums of random variables, the other deals with the problem of 
limits of random variables, in the sense of almost sure convergence, of such 
sequences. These problems will be labelled, respectively, the Central Limit Prob¬ 
lem (CLP) and the Strong Central Limit Problem (SCLP). Like all mathemati¬ 
cal problems, the CLP and SCLP are not static, as answers to old queries are 
discovered they experience the usual development and new problems arise The 
development consists in ( 1 ) simplifying proofs and forging general tools out of 
the special ones (li) sharpening and strengthening results (ni) finding general 
notions behind the results obtained and extending their domains of validity. 
Analysis of this growth will put m relief the role and the interconnections of the 
fundamental limit theorems. 

Summary. The growth of the CLP for independent summands can be divided 
into three (overlapping) periods The first covers the Bernoulli case and the 
corresponding limit theorems of Bernoulli, de Moivre and Poisson. The first two 
theorems gave rise to the notions—from which the classical CLP stems—of 
the Law of Large Numbers (LLN) and of Normal Convergence (NC). Poisson’s 
approach belongs to the set-up of the modern CLP. 

The second period extends over two centuries and is devoted to the extension 
of the domains of validity of LLN and NC. This is the classical CLP period. 
Lyapunov's crucial work, submitted to the above treatment, led to the discovery 
of the natural boundaries of these domains by Lmdeberg, Kolmogorov, Feller 
and P. L6vy. 

However, the LLN and NC problems are but two particular cases of the 
general problem of limit laws of sequences of sums of independent random 
variables. The coming into sight and the solution of this problem—the third 
period of the CLP—covers less than ten years. The tools forged for the classical 
CLP proved to be powerful enough and the final solution is due to P. L6vy, 
Khmtchme, Gnedenko and Doeblin, 

1 This paper was presented to the New York meeting of the Institute of Mathematical 
Statistics on December 27, 1949. 

Editor’s Note The Institute of Mathematical Statistics has formed a Committee on 
Special Invited Papers to invite lecturers to deliver expository addresses to the Institute 
with the understanding that the Special Invited'Papers are to be published in the Annals 
of Mathematical Statistics This paper is the first one invited by the Committee. 

* This work is supported in part by the Office of Naval Research 

321 



322 


M. LOilVE 


The CLP for dependent variables started with so called Markoff chains. 
The study of their limit properties is due essentially to Markov, S Bernstein 
and Doeblin. For more general forms of dependence the LLN and NC problems 
were investigated by P, Ldvy and Lohve after the crucial work of S. Bernstein 
The modern CLP was considered only recently (Lofeve). 

The SCLP stems from the strengthening by Borel of the Bernoulli theorem 
and the sharpening of Borcl’s result by Khintchine. They gave rise to the no¬ 
tions of Strong Law of Large Numbers (SLLN) and of the Law of the Iterated 
Logarithm (LIT). 3 The domains of validity were extended to their boundaries 
by Kolmogorov, P Ldvy and Feller. In the case of dependence, results are due 
to G. D Birkhoff, P. L6vy, W Doeblin, and Lofeve However, the SCLP has 
not attained, at piesent, the harmonious development of the CLP. 

Notations. Let <£(X) be the law of a (real) random variable (r v.) X The law 
is defined by the distribution function (d.f) F(x) = P(X < x). As is well known 
J2(X) is determined by the characteristic function (ch. f.) 

r+°° 

f(u) = J e" 1 * dF(x), — co < n < -b oo. 


When a r.v possesses subscripts, the same subscripts will be used for its d.f. 
and ch f EX will denote the expectation of X: 



x dF{x), 


and <r ! (X) will denote the variance of X: 

<r\X) = E(X - EX) 2 . 


With a random event A >ve associate a r.v., to be called indicator of the event A, 
which takes values 1 and 0 respectively, according as A occurs or does not occur. 
If X is the indicator of an event A of probability p, then EX = p and <f(X) = pq, 
where q = 1 — p. To avoid trivialities we shall assume that pq ^ 0- 

Two laws £(Xi) and £(X 2 ) will be said to belong to the same complete type 
if there exist two numbers a ^ 0 and b such that P{Xi x] = P {aX 2 + b g x } 
If values of a are restricted to positive values, then the two laws are said to 
belong to the same type. If two independent r.v.'s obey £ and their sum belongs 
to the type of £, then £ and its type are said to be stable. Three classes of laws 
play an essential role in the CLP 1 the normal and the degenerate types and 
the Poisson complete types. 

Gl(m, cr) is a normal law if it is defined by 

F{x) = — 7 = f dt (o- > 0). 

<r'v2ir J~k 

a For a very thorough and deep analysis of the NC and LIT problems and their solutions 
see Fblleb, Bull. Am. Math. Soc , Vol. 51 (1945), pp. 800-832, under the same title as that 
of the present paper. 



LIMIT THEOREMS 


323 


£(m) is a law degenerate at m, if it attaches probability 1 to the value to. 
9 \\; a, b) is a Poisson law if 

P(X = ah + b) = e~ x (X > 0), k = 0, 1, 2, • • • ; 

the familiar Poisson law is 9^X; 1, 0). 

A law £(I„) is said to converge to the law £(X) as n —> cc, if F n (x) con¬ 
verges to F(x) at the continuity points of the latter. In this paper’ all limits will 
be considered for n —> *>, if not otherwise stated. 

The structure of sequences of r.v.’s whose limit properties are investigated 
will be called the limiting process of the problem. The limiting process of sequences 
of sums is that of sequences of the form X n , k , where v n -*■ ®.The 

a 

limiting process of normed sums is that of sequences of the form ~ — b n with 

i X k , where a n > 0 and b n are real numbers. Normed sums are a special 

X h a 

form of sequences of sums: take v n = n, X n , k = —-- n , then S n ,, n = — — b n . 

Q>n (X n 

To avoid repetitions we shall note, once and for all, that limit types rather than 
limit laws appear in the case of normed sums, because, if £(X) is their limit law, 
then any law of its type is obtainable as a limit law by a convenient change of 
origin b n and of scale a n , independent of n. The importance of the notion of 
type is due, primarily, to this property. In fact, even more is true: if £(X n ) 
converges to £(X) and X(a n X n + b n ) converges to £(F), then £(X) and £(F) 
belong to the same type, provided neither is degenerate (Khintchine [20]) 

I. Central Limit Problem 

2. Origin of the CLP: Binomial case. Three limit theorems are at the origin 
of the CLP; the first, due to Bernoulli ([2], 1713), laid the ground. Let S n be 
the number of occurrences of an event A of probability p m n identical and inde¬ 
pendent trials. Then, for every e > 0, 

Bernoulli found this result by a direct, but cumbersome, analysis of the be- 
havious of the binomial probabilities 

P{£ n = fc] = C k n p k q n -\ k = 0, 1, • • • , n. 

* 

Sharpening this analysis, de Moivre ([7], 1730) obtained the second limit theorem 
of probability theory which, in the form given to it by Laplace, states that; 
For every x 




324 


M. LOEVE 


Suppose now, with Poisson ([3G], 1837), that the probability p = p n depends 

upon the number n of trials and, more precisely, that p n = - , where X is a posi- 

n 

tive constant. Write then S n , n , instead of S n , for the number of occurrences 
of the considered event in a group of n trials. By a direct analysis of the binomial 
probabilities, much easier to carry out than the preceding ones, it follows that 
for k = 0, 1, ■ ■ • , 

P{S„, n = k] 


Let Xk be the indicator of the event A in the k- th trial. The number of occur¬ 
rences S n is the sum 22*-i %k of n of these independent and identically distrib¬ 
uted indicators. The first two limit theorems mean that 

£ ~ -— S A ^ £(0) and £ 9i(Q, 1). 


Thus we have two limiting processes, (both special and completely specified 
forms of normcd sums), and two limit laws (more precisely two limit types, see 
introduction), a degenerate and a normal one 
Poisson’s limiting process is utterly different. is still a sum 22 k-iX n ,k 
of independent and identically distributed indicators but, as n varies, all X n ,k 

change, P(X n , k = 1) = - and 
n 


£(Sn,n) 1 , 0 ). 


While the two first theorems with their special limiting processes and limit 
laws played a central role in the development of Probability theory, Poisson’s 
result stood isolated and ignored until about fifteen years ago 4 We shall see 
further that there was a deep reason for its isolation and also that, surprisingly 
enough, Poisson laws are, in a sense, more fundamental for the CLP, than the 
normal law 

3. The classical CLP and its extension. From the time of Laplace until 1935, 
research in the domain of limit laws was centered about the extension to sum¬ 
mands other than indicators of the validity of the two first limit theorems 
This is the period of the classical CLP: Let S n = 'Y'a-i Xk be sums of independent 
r.v.’s. Find necessary and sufficient conditions for the LLN and for NC, i e., con¬ 
ditions under which, respectively, 



4 In Uspensky's textbook (1937!) Poisson’s law is mentioned once—in an exercise. 



LIMIT TIIEOKEMS 


325 


It is assumed that EX k s and EXl’s exist. The d.f not bemg completely speci¬ 
fied as in the Bernoulli case, the direct Bernoulli-de Moivre approach is of no 
avail and general methods are necessary. The first to appear was the method of 
moments relative to bounds of d f. in terms of their moments (Tchebicheff [40], 
Markov [37]). The relation 


P 


S n - ESn 1 1 a\S n ) 

n > 7 ~ eV 3 


e > 0, 


together with 

a\S n ) = £ a(X k ), 

entails at once a LLN theorem (Tchebicheff-Markov): If 

4 £ <y\x k ) - 0 , 

n 2 *-i 

then the LLN holds. 

This result can be easily improved (bringing it into closer analogy with 
Lyapunov’s theorem): If there exists a constant 8 > 0 such that 

4 i e \x t ~ exh r - o 

then the LLN holds. 

It contains then a Markov’s LLN condition: LLN holds if E \ X k — EX k | 1+! 5 
C where C is independent of k. 

In a much more elaborate form the method of moments gives also a NC the¬ 
orem (Tchebicheff-Markov) : If EY k n —> EZ k for k = 1 , 2, ■ ■ • , and £.(Z) = 
91(0, 1), then £(F n ) -> 91(0, 1) 

This theorem has been extended to more general limit laws. However the 
inherent defects of the method of moments remain. Even if moments of all 
orders exist, they do not necessarily determine a unique d f. A definitive result 
in this direction is the Fr6chet-Shohat theorem :IfEY k n —> ni' k) for all 1c, there exists 
a subsequence £(F„J which converges to a limit law S. with moments m (fc) . More¬ 
over, if the moment problem is determined, i.e., if the m w determine a unique law, 
then the whole sequence £(F„) converges to £. 

To apply the convergence theorem to the NC part of the classical CLP, 
one has to assume existence of moments of all orders. In particular, it does not 
seem suitable for proving Lyapunov’s theorem. Yet, the simple truncation idea 
(Markov) not only overcomes this seemingly insurmountable obstacle, but also 
provides a method per se. It associates with the summands X k “truncated” 
r v’s X k ; for k ^ n and c n conveniently chosen real numbers, 

X k = X k if [ X k I g c n , 

X k = 0 if I X* I > Cn . 



326 


M. LOiiVE 


Nevertheless, the method of moments is too cumbersome and was soon to be 
discarded in favor of that of ch.f.’s. 

The turning point for the entire CLP is Lyapunov’s introduction of the 
method of ch.f.’s. The ch.f.’s were well known and used already by Laplace. 
However, the first convergence property, proved but not stated, is due to 
Lyapunov [28]: If the ch.f.’s g„(u) of £(F„) converge to the chf. e -u ' /2 of 9l(0, 1), 
then £(F„) —> 91(0,1). From it he deduced the first general NC theorem [28, 29]: 
If there exists a number 8 > 0 such that 

1+Ln E B I X k - EX k | 2+5 -» 0, 

<r \L n ) *-i 


then NC holds. 

The ch.f. became, in the hands of P. L6vy [21], a general tool, instrumental 
in the subsequent tremendous growth of the CLP, with the so called 

Continuity Theorem. If the chf.’s g n (u) converge to a function g(u) con¬ 
tinuous at u = 0, then £(F n ) converge to a limit law .£ and g{u ) is its ch and 
conversely. 

The methods of ch f and of truncation dominate at present the limit prob¬ 
lems of Probability theory. 

In spite of the generality of the above conditions for LLN and NC, they are 
not necessary conditions. In fact they are not sharp enough since they assume 
the existence of moments of higher order than those which figure in the classical 
CLP. However the tools forged proved powerful enough to get its complete 
solution. The truncation method yielded to Kolmogorov ([16, 1928) the com¬ 
plete answer to the LLN problem. A “smoothing” device, due to Lyapunov, 
provided Lindeberg ([20], 1922) with adequately sharp sufficient conditions, 
using ch.f.’s P. L6vy ([22], 1922) proved Lindeberg’s result and Feller ([11], 1935) 
showed that, under a natural restriction, these conditions are also necessary. 

Solution of the classical CLP. 

1. LLN holds if, and only if, 

E f dFkix +EX k ) — ► 0 and E - [ x dF k (x + EX k ) -» 0 

fr—1 j|l|>n I —1 n T 9|z|<n 

for r = 1,2. 


2 . 


NC holds and max 


<r(X k ) 


—> 0 if, and only if, for every e > 0, 


j§ c 2 (S„) i|*|> 


«»<Sn) 


x 2 dF k (x + EX k ) 0. 


An unsatisfactory feature of the classical CLP is the assumption, made at 
the start, of existence of certain moments. They are used to avoid, as n —* ®, 
the shift, towards infinite values, of the probability spread by changing the 
origin and the scale of values of S„. However there is no specific reason for 
these special choices of norming quantities a n and b n except that, historically, 



LIMIT THEOREMS 


327 


they appeared as a straightforward extension of Bernoulli and de Moivre ones. 
Moreovei, even if these moments do not exist, there is no reason not to try 
to find norming quantities. (Take Xfs to be independent and identically dis¬ 
tributed as follows: to dzy/m where m = 1, 2, .. '' 


attach probabilities 


2 2 
7r m 


The second moments are infinite, yet norming S n by cs/n log n, we have NC.) 
Thus the CLP becomes the problem of the LLN and NC for general normed 
S n , 

sums — — b n . 

d n 

The extended classical NC problem was solved, masterfully and independently, 
by Feller ([10], 1935) using ch f.’s and by P. Levy ([25], 1935) who applied the 
method of truncation The extension of the results to the more general set-up 
of the following section is trivial and will be given there. Feller also solved 
([11], 1937) the extended LLN problem. 

In this new set-up a question arises at once. Given Ihe rv’s X k , do there 
exist numbers which will produce the desired convergence? If so, how can they be 
found? This problem is perhaps more difficult than the previous one and is 
specifically linked with the limiting process of normed sums We shall give 
here a criterion, due to Feller ([10], 1935), which solves entirely the NC prob¬ 
lem. 6 Take as origin of values of the summands their medians and let c„(e) be 
the g.l b. of the x’s for which ]F)*_iP(| X k \ > x) g e. Then norming quantities 

9l(0, 1) and max P f — |>«l-+0 

fc ^ n ^ Ct n 71 

exist if, and only if, for every e > 0, 


a„ and b n such that £ 


1UI “i UibU / 

i-? - *■) 


Cn («) 


L 




x* dF k {x) —> a>. 


4. Modem CLP. At the same time that the classical CLP neared its happy 
end, a new and much wider problem of limit laws appeared and, because the 
necessary tools were at hand, was solved almost at once. Various particular 
problems, of which the classical CLP is one, contributed to its set-up 
Since the discovery, in the Bernoulli case, of the LLN and NC, the problem 
of limit laws has been centered about extensions of their domains of validity 
for more and more general normed sums. A similar query about the Poisson 
convergence would have provided us with a new problem. As soon as we drop 
the restriction that in Xn,k the r.v.’s X n , k are indicators, we are 

led to the problem of finding conditions under which laws of sums of inde¬ 
pendent r.v.’s will converge to a Poisson law. We have here not only a different 
limit law than in the CLP but also a more general limiting process. An utterly 
different problem, stated and solved by P. L6vy [21], is the following: find the 

'As for the LLN, norming numbers, such that the LLN holds always exist whatever 
be the tv'sXi, Hence, from the point of view of limit types of noimed sums, the degen¬ 
erate type is to be considered as a degenerate form of every limit type. 



328 


M LOfeVE 


possible limit laws of normed sums of independent and identically distributed r v.’s 
(the answer is that they are the stable laws) . For the first time one does not inquire 
about a completely specified limit law but about the class of all limit laws for 
a fairly general limiting process, Thus, starting with limit theorems with com¬ 
pletely specified limiting processes and limit laws, after two centuries of struggle 
Probability theory got rid of initial restrictions. 

The general set-up is now visible. The limiting process is that of sequences of 
sums of independent r.v.’s. The queries are about the classes of possible limit 
laws and conditions of convergence. However, so general a limit problem is 
without content. In fact, the limiting process is that of arbitrary sequences of 
r v.’s. let {Tji} be any sequence of rv.’s and take X n ,i = Y n , St(X n ,k) = £(0) 
for 1c > 1, Any law £ belongs to the class of limit laws: take £(F n ) = £ Hence 
some restriction is needed. To find a “natural” restriction consider the previous 
problems. Their common feature is that the limiting process is that of sequences 
of sums of independent r.v.’s, the number of summands increasing indefinitely. 
If we wish to emphasize this feature, a relatively small number of summands 
ought not to have a preponderant role m the determination of the limit laws 
A “natural” restriction is then a requirement of uniform asymptotic negligibility 
(uan) of the summands, i e., for every e > 0, P{ | X nk | > «} —> 0 uniformly in fc. 
We come thus to the Modem CLP. Let S„,,„ = , v n —■ ► °°, be sums 

of r.v ’s Xn.k, mutually independent for every fixed n, and such that 

max P(X„,fc | > e) —> 0; 

k 

characterize the class [D] of limit laws of the and find necessary and sufficient 
conditions for convergence to any element of this class. 

The solution of this problem is essentially due to the results of investigation 
of random functions X(i) with independent increments. Let X(0) = 0, divide 
the interval (0, i) into v n subintervals (4-i, t k ) with U = 0, and denote by X„k 
the increment Z(4) — X{t k - 1 ). Then X(l) = ^tli X nk where X n!c are independent 
r.v.’s. If, moreover, X(t) is continuous m probability for every t, i.e., if 
£{X(f + h) — X(t) ) —> £(0) as h —> 0, then the X n .k can be chosen to obey 
the uan restriction as v„ -* » Hence £{X(i)} might be expected to belong 
to [D]. 

The particular case of the modem CLP for summands and limit laws with the 
finite second moments was solved by Bawly [1], using Kolmogorov's char¬ 
acterization of X(i)’s with finite second moments [7]. The general problem, 
thanks to a much more general result by P. L6vy ([24], 1934), was solved by 
P. L4vy, Khintchme ([20], 1937), Gnedenko ([14], [15], 1938, 1939) and Doeblin 
([8], 1938-1939). The method used throughout was that of chi.’s. (except m 
the case of Dobhn who used also the P. L&vy “dispersion” function). 

One can avoid an explicit introduction of the considered random function 
X(t), limiting oneself to the corresponding (infinitely divisible) laws. For a 
very large n, is, roughly speaking, a very large number v n of very small 
(in probability) independent summands. This leads at once to the consideration 



LIMIT THEOREMS 


329 


of laws which possess such a property for any v n and, first, the infinitely divisible 
(i.d.) laws. A law is l.d if it is a law of sums of an arbitrarily large number of 
independent and identically distributed rv.’s. In other words, f(u) is the ch.f 
of an i d. law if[f{u)f ln is a chf. for every positive integer n. One might expect i.d. 
laws to belong to {D} and, surprisingly enough, it turns out that, because of 
the uan, {D) contains only i.d. laws. 

We can now state the solution of the modern CLP, in three parts. Let 
~r — + / , let 0(x) be any function, defined and non-decreasing m 

(—os, —0) and (+0, -f- oo), with<£(— °o) = <£(+<») = 0 and j- x 2 dfi(x) < «>, 

and let a and /3 be real numbers. 

I. The function f{u) is the ch.f. of an i.d. law if, and only if, 

log m = ioiu. - 1 2 u 2 + £ x (V ui - i - dfix), 

and f(u) determines uniquely a, /3 and <t>(x) at all the continuity points of the latter • 
(P. L6vy). 

Normal laws are obtained for <t>(x) m 0 and Poisson laws correspond to the 
<j>(x) with one point of increase ( x ^ 0) only. The fundamental role of Poisson 
laws appears clearly since, roughly speaking, an l d. law is the convolution of a 
normal law and a continuum of Poisson ones. This role is further emphasized 
by the following theorem (Khintchine [20]): A law is i d. if, and only if, it is 
the limit law of sequences of sums of independent Poisson r.v.’s In other words, 
the class of i.d laws is the closure of laws of finite sums of independent Poisson 
r.v.’s. 

II. The class {D\ of limit laws of the modem CLP coincides with that of i d. 
laws (P L6vy-Khmtchine). 

Together with I this result characterizes m an explicit manner the class \D]. 
An immediate question arises (Khintchine). What about the limit laws of normed 
sums? The answer is the following (P. L6vy [27]). Let y = log | x |, 
h(y) = -4>(x) for x < 0, fi(y) = for x > 0 where y = log | a: |. The limit 
laws of normed sums, under uan, are the id. laws with convex ^i(y), k = 1, 2. 

In particular a Poisson law does not belong to this subclass {Z>at} of {D}, 
hence cannot be obtained as a limit law of normed sums. This brings out the 
deep reason for the isolation in which the Poisson law remained as long as the 
limiting process was restricted to that of normed sums 6 II shows that, with 
respect to the possible limit laws, the limiting process of the modern CLP is 
definitely wider than that of the classical CLP and of its extension. However 
the entire class \D] can be obtained with normed sums, provided we consider 

8 A problem, specific for normed sums, ariseB • given r.v.'s Xk, find necessary and suf¬ 
ficient conditions for existence of norming numbers such that the laws of normed sums 
would converge to a given element of and, if they exist, find them. Feller’s NC cri¬ 
terion solves a particular case of this problem. 



330 


M. LOEVE 


not only limit laws but also "accumulation” laws (P. Ldvy-Khintchme): A law 
is id. if, and only if, it is the limit law of a subsequence of normed sums of inde¬ 
pendent and identically distributed r.v’s. 

I and II provided Gnedenko and, independently, Doeblin with the properties 
which allowed them to find conditions of convergence, thus completing the 
solution of the modern CLP Let 

al(X) = [ x* dF(x) - [* f x dF(z)T 

LJlzlo 

denote a “truncated” variance of X. 

III. Under uan, — b n ) converges, necessarily to an i d. law “for a con¬ 

venient choice of b„”, if, and only if, 

(i) £ F nk (x) —> <f(x) for x < 0, £ [1 — F nk (x)] —<t>(x) for x < 0 

k -1 k-1 

at the continuity points of and 

Vn 

(ii) lim lim inf 2 x]{X n , k ) = . 

t—*0 n k~ 1 

In particular, since normal laws correspond to s 0, the NC conditions 
of Feller and P. L6vy follow: £(S„, kll — b n ) converges to 9l(0, 1) for a convenient 
choice of b n and uan holds if, and only if, for every e > 0, 

Vn /• Vn 

(i) £ / dF„k(x) —> 0 and (ii) X) <r](X n k) —> 1- 

k-l J|x|>* *.-1 

The first condition shows that among all limit laws under uan, limit norm¬ 
ality corresponds to a sufficiently strong asymptotic negligibility of the sum¬ 
mands, and, more precisely, to 

£ n\ X nk I > e) -+ 0, 

k-i 

or, equivalently, to 

P (max 1 X n k) > e) —» 0. 

k 

Another illuminating characterization of NC (Raikov [39]) follows also from 
III. Take for origin of values of summands the truncated first moments 

J x dF n k(x ) Then — b„) —> 9l(0, 1) for a convenient choice of b n 

if, and only if, X{ k ) —> £(1). 

B. CLP in the case of dependence. Limit problems for sums of dependent 
r.v’s. were considered for the first time by Markov [37], less than fifty years ago. 
He extended the first two limit theorems of probability theory to the case of 
events linked in chain, i.e , such that P{A k | Ai , ■ ■ ■ A*_i) = P(A k | A k - 1 ). 



LIMIT THEOREMS 


331 


However the crucial work in this field is the celebrated memoir by S. Bern¬ 
stein ([3], 1927) which has the same historical importance for the dependence 
case as that of Lyapunov has for the classical CLP 

Let [Xk] be a sequence of r.v.’s. E'X k will denote the conditional expectation 

ft 

of Xk , given Xi , ■ • Xt-i . Consider the sequence of sums S n = Y X k , with 

k-l 

EXk = 0 and let a n = \/ Y o- 2 (X») 

V k-i 

Bernstein’s NC Theorem. If 

(i) - Z sup | E'X k | -+ 0, (ii) 4 E sup | E’Xl - EXt | -> 0, 

0"n 

and 

(iii) 4 Y sup E' | Xk | 3 — > 0, 

<*n k-l 

then 

£ ^ _> 91(0, 1). 

Obviously, if the XVs are independent, this theorem reduces to Lyapunov’s 
with 5 = 1. The method used is still that of ch.f’s From this result Bernstein 
deduces various particular NC cases and, applying them to Markov chains, ex¬ 
tends the latter’s results. 

The unpleasant feature of the above theorem is the use of suprema of condi¬ 
tional expectations and, except when the r v.’s X k are bounded, one cannot ex- 
fect these suprema to be finite. On the other hand, the conditional expectations 
are r v.’s and it would be natural to associate their values with the corresponding 
probabilities This can be done and Bernstein’s theorem can be improved in 
various directions simultaneously. First it may be stated for sequences of sums 
S n ,y n — this is trivial; next it extends to 5 > 0 instead of 5 = 1 — this contains 
completely Lyapunov’s result but is of secondary interest. Then NC can be re¬ 
placed by asymptotic normality, i.e., by the existence of a sequence of normal 
laws 9l(0, ff n ) such that the “distance” between £(S„,,„) and 91(0, cr„) would 
approach zero as n —* <*>—this is quite simple to get. However, significant im¬ 
provements are obtained on replacing suprema by expectations. Let F n (x ) be 
the d.f. of Sn,, n and <?,(*) be that of 91(0, <r 2 „). Then, taking EX* « 0, we have, 
the following 

NC Theorem. If (i) Y kE | E'X n k | —*■ 0, (ii) Y kE | E'Xn k — EX nk | —> 0 
and (iii) there exists a constant S > 0 such that Y k | X nk | 2+5 —* 0, then F n (x) — 
O n (x) —> 0 . 

This theorem shows that, so far as moments of order higher then the second are 
concerned, the NC condition is the same as in the case of independence In this 
last case the theorem is a slight improvement of that of Lyapunov. In 1941 condi- 



332 


M. LOEVE 


tions for LLN and NG were given (Loeve [31], [32]) in tire frame of the modern 
CLP, without assuming the existence of moments; when independence is as¬ 
sumed, they reduce to those given by Feller. Conditions for NC which in the 
case of independence, reduce to Lmdeberg’s, were then deduced in the particular 
case of finite second moments and special cases of NC, including those con¬ 
sidered by S Bernstein, were obtained. 

The whole modern CLP had not been considered until lately (Lohve, [33-35]). 
It appeared useful to extend the CLP to an “Asymptotic Central Problem” 
(ACP); primarily, to the behavior of S£(S n , Vn ) as n —» °o. This in turn, led to the 
introduction of laws “in a wide sense,” i.e., .with possible positive probabilities 
for infinite values. To the sequence {£($,»,,,„)} is associated another conveniently 
chosen sequence £„ of laws of sums; if £ n —» £ or £„ = £ then the ACP reduce 
to the CLP, The investigation uses an extension of the P. Ldvy convergence 
theorem for ch f.’s and the modern CLP solutions are obtained as particular 
cases The case of sums of a random number of r.v.’s, 7 as well as the multidimen¬ 
sional case, are easily treated by the same methods [35] 

Many new problems anse in ACP. The foremost corresponds to possible 
relaxations of the uan condition For instance, in the case of independence, the 
relaxed condition 

max P{\ X nk — Y k \ > «} —» 0, for every £ > 0, 

k 

where Yi, F 2 , • are independent, does not change, essentially, the nature 
of the ACP Yet, as soon as dependence is introduced, the whole outlook changes 
and it would be interesting to investigate various new possibilities which thus 
arise. On the other hand, stiicter than uan conditions are of special interest 
when independence is not assumed The one which seems natural is the following: 

max sup P'j| X nk | > e] —» 0, for every t > 0, 

k 

where P'(A nk ) denotes the conditional probability of the event A n , k , given 
Xn.i, • , In,*—i. An immediate problem is whether this or an analogous 

lestnction enables us to find, not only sufficient, but also necessary conditions 
for various convergences and various cases of dependence. 

II. The Strong Central Limit Problem 

6. The Bernoulli case and its extension. A sequence {X„} such that the corre¬ 
sponding sequence of laws converges does not, m general, determine a r.v. 
X which might be considered, in some sense, as the limit of X n . However, if we 
define two r.v’s X and X' such that P(X X') = 0 as equivalent, then, when¬ 
ever £(X m — X n ) —> £(0) as — + — —► 0, the sequence I An] determines a 

mn 

1 H Robbins (Bull, Am Math Soc , Vol. 54 (1948), pp. 1151-1161. studied in detail the 
case of independent and identically distributed -XV s -with EXl < ® and v„ , independent 
of X k ’s , with El < ® 



LIMIT THEOREMS 


333 


unique r v. X (up to an equivalence)—for which P{ | X n - X | > e) —> 0 for 
every e > 0 This X is the limit m probability of X n . 

Yet, an observed sequence of values of {X n } need not converge to the ob¬ 
served value of X. For instance, let Y be a r.v. uniformly distributed over (0,1). 
Consider the sequence [D„] of partitions of (0, 1) into n equal subintervals 
and to the 7c-th subinterval of D n attach the indicator X n jc of the event when 
Y falls within this subinterval. The sequence X U i ; Xi,i, X 2|2 ; X B ,i, X 3l2 , 

J 3i3 , ■ ■ converges in probability to zero since P(X nk ^ 0) = - , for 

7c — 1,2, • • ■ , n, approaches zero as n —» «. On the other hand, observed values 
of X n k s, for 7c = 1, 2, ■ ■ ■ , n, will contain n — 1 zeros and a one, except in cases 
of total probability zero. Hence, except in these cases, any observed sequence 
will contain infinitely many zeros and infinitely many ones and will not converge. 

S 

The Bernoulli theorem means only that f n =— converges in probability to 

Tl 

zero. Borel showed, in a fundamental memoir ([5], 1909), that Berno ulli ’s state¬ 
ment is too weak, and, m fact, that observed values of /„ converge to zero, 
except in cases of total probability zero Borel’s proof is based upon a direct 
analysis of the de Moivre-Laplace approach to NC. Thus a new domain in 
probability theory was opened to exploration. 

First Strong Limit Theorem. In the Bernoulli case 

P(lim /„ = p) = 1. 

n 

This leads to the introduction in probability theory of the notion of almost 
sure (as.) convergence' 

X n X if Pflim X n = X] = 1, 

n 

or, equivalently, if for every e > 0, 

P{ 1 X n +k — X | > e for 7c = 1, or 2 or - • ad inf.) —> 0 as 7i —* w. 

If we denote by A n the event | X n — X | > e, we see that we are concerned 
here with 

P = P (realization of infinitely many events -A n ) = lim lim P(i4 n +i U ... u 

n-*» y—»flO 

J 4-n +v ) 8 From Boole’s inequality 

n+y 

P(A n+ iU ... UA n+ ,) g E P(Ak) 

follows, at once, the fundamental Borel-Cantelli Lemma. Ij E„ P(d-j < °° 
then P = 0. This lemma can be extended, using sharper inequalities (Lofeve [32]). 

9 Already Pomcar4 considered such probabilities in his investigation of "recurrence” 
and this, before the notion of completely additive measures was born. 



334 


M. LOfeVE 


Now apply the Tchebicheff-Markov inequality 

E\X n 


P{\X n -X\>e] =2 


r > 0, 


and the Cantelli criterion follows: if for some r > 0, E E [ X n — X | r < oo 
then X n -^4 X. 

Applying it, with r = 4, to the Bernoulli case, Cantelli [6] obtained an almost 
immediate proof of Borel’s result. An even simpler proof is as follows: 

En E | / n i — p | 2 < «> since E(J n — p) 2 = —, hence /„2 — p 0. Moreover, 

7h 

2 

I fy ~ /n s 1 g - for 0 ^ v — n 2 ^ 2 n, hence /» — /„« —* 0 in the usual sense, 
n 

uniformly in v, and the theorem is proved This last method applies as well to 
sequences of dependent events {Bn), which constitute a natural extension of 
the Bernoulli case. Let 


pi(») = -Em), P*(n) = 4 E 

n Jfc-i On 1 Sfc< l£n 

S n = Vz{n) — p\(n ) (in the Bernoulli case S n = 0!). It is very easy to show that 
fn — pi(n) —> 0 in probability if, and only if, 6„ —> 0; this extends the Bernoulli 

theorem. Moreover, if n | i„ | £> C < » then f n — pi{n )—-^>0 (Lofeve [31]), 

I 6 I 

and Dvoretzky [10] proved that it is enough to have 2 < °o. Thus we 

have a simple extension of Borel’s result. 

The method used by Borel, while uselessly complicated in view of the result 
obtained, is very powerful and, by sharpening it, the law of the iterated logarithm 
(Khintchme [18]) follows. 

Second Strong Limit Theorem. In the Bernoulli case 


P jlim 


sup 


S n - ES„ 


= 1 = 1 . 


<r n (2 log log <T n ) 112 

where a n = cr(5„). 

Let us use the following terminology (P. L6vy [26]). A non-decreasing se¬ 
quence (<f>„) of positive numbers belongs to the Zoioer class L, if the probability 
that S n 4 <j> n , from some n onwards, is 1, and it belongs to the upper class U 
if this probability is 0, The following criterion (Kolmogorov) applies: In the 

Bernoulli case (^> a ) belongs to L or U, respectively, according as E*> ~T 4>* = oo 

O" n 

or < oo. Clearly this result contains the Khintchine’s LIT. 

7. The general case. The question of domains of validity of the obtained re¬ 
sults arises immediately and thus the SCLP appears in its present form. Let 
Sn — Ew Xk be sums of r v.’s Xu, independent or not. Fmd conditions for 1° a.s. 


s s 

convergence of — or, more generally [31] of — 
n a. 


a n | oo (SLLN). 2° the law 



LIMIT THEOREMS 


335 


of tlie iterated logarithm (LIT) and, more generally, criteria for classifying 
sequences {<£„}. 

The second problem, in the case of independent summands possesses almost 
complete solutions due, respectively, to Kolmogorov [17] and to Feller [13]. 

a. If sup j Xk | = o(<r„/(log log & n) — ) for k g n, then LIT holds. 

b If sup | Xk | = 0(<r„/(log log tr„) _ ) fork S n, then the criterion for the 
Bernoulli case continues to hold. (Feller also gave sharper criteria). 

In the case of dependent summands general results were obtained by P. L6vy 
[26] and for Markov chains by Doeblm [7] The problem belongs (at present) 
to the domain of NC; it is complicated and pries deeply into the behavior of 
probabilities as n —> co , Yet, in the case of independence, the dichotomy into 
classes L and U is more general as shown by the following property (P. Levy 
[26]). If [ iSn] is a sequence of consecutive sums of independent r.v’s, and cannot 
he reduced by adding constants to an as convergent sequence, then, for any given 
sequence (c„) of sure numbers, P(S n > c n for an infinity of values of n) = 0 
or 1. 

The SLLN problem seems easier. Nevertheless it is far from being solved; 
we don’t even know necessary and sufficient conditions for the SLLN in the case 
of independent summands m terms of individual d f’s. B The essential tools are, 
besides the fundamental Borel-Cantelli lemma, 1° the truncation method to¬ 
gether with the convergence m r-mean: X n —» X if E | X n — X\ T —>0(r > 0), 

2° the Kronecker lemma: If Xk/a k is convergent, then — Ss-i —* 0 

(l n 

(a„ j oo). It provides a possibility of transforming problems about the SLLN 
into those of as. convergence of series of r.v’s, at least when sufficient con¬ 
ditions are sought for 

In the case of independent summands one can start with the following prop¬ 
erty of senes (L6vy [23]): a.s. convergence of 2^iXt is equivalent to convergence 
m probability. (It can be shown that this property holds also for certain classes 
of dependent summands.) On the other hand, convergence m q m. (r = 2) 
entails convergence m probability. Hence, when EX k < 03 > taking EX k as 
the origin of values of Xk , it follows that If 2" a (X„) < 00 , then S n a.s con¬ 
verges. Kolmogorov pioved this result using his celebrated inequality which 
considerably strengthens that of Tchebicheff: 

P[max | S n j > e] ^ • 

IS" 6 

This inequality has been extended by P. L6vy [26], and by Loeve [32] to de¬ 
pendent summands and conditions for a.s. convergence were deduced from it. 
If the EXl are not finite, the truncation method is applied Put X k = X L , 
,f | x k | g 1 and = 0 if | Xt | > 1. Then (Khintchine-Kolmogorov) L»X„, 

• A first step in this direction is due to U. V. Prokhorov, “On the strong law of large 
numbers” (in Russian), Dokl Ak. Nauk Vol 69 (1949), pp 607-610 See also a paper by 
K, L. Chung to appear in the Proceedings of the Second Berkeley Symposium. 



336 


M. LOfeVE 


where X n are independent r.v.’s, is a.s. convergent if, and only if, 22n P(X n ^ X' n ), 
22» v(X' n ), 22n (X'n) converge 

It is not difficult to obtain conditions for series of dependent summands 

/•+< 

Let q n (t) = P[ | Z„ | > £}, £„ = J xdF' n {x), where F' n (n) is the conditional 
d.f. of X n , given Xi , ■ • ■ Z n _i. If 22n [ tq n (t) dt < &> for an e > 0, then 

Jo 

22n (X n — £„) a.s. converges. 

By using Kronecker’s lemma the results above yield immediately sufficient 
conditions for the SLLN. Those which come from the last one would in turn 

yield without difficulty the following: Let a n '|' oo and 7j„ = / x dF' n (x). 

J-ca n 

If'TjnQniaJ) ^ all) and / tq(t) dt < a>, then — 22 (Xk — Vk) —*■ 0. 

Jo a„ t-i 

Take now the particular case: a n = n , and X*.’s independent and identically 
distributed. From the stated result follows: 

1 n g 

1. If EXk = m exist, then - 22 Xk ——> m and conversely (Kolmogorov). 

n jt-i 

r +a 

2. If 0 < r < 2, r ^ 1, E \ Xk | r < «> and lim / x dF n {x) = 0, then 

a —*00 «/—a 


1 71 

~r r ZX k 

n llr 


as. 


0 (Marcinkiewicz). 


Other conditions for SLLN, in the case of dependence, are known (Ldvy [27], 
Lofeve [32]). 

The above result of Kolmogorov is a particular case of the celebrated ergodic 
theorem (BirkhofE [3]) which can be considered as a SLLN for a special case of 
dependence Let A n be an event defined on the set { Xk t , • • • Xk n ) and 
let be an event defined in the same manner on the translated set 
[X* 1+m , • • , Xfc„ +m ), The sequence [X*] is called stationary if = P(A n ) 

for every finite set {ki , • ■ ■ , k n } and every finite m. The ergodic theorem states 

that If the sequence [X*] is stationary and E \ Xk I < 00 , then — 22 w Xk 

n 

converges a.s, 10 

However an unsatisfactory feature of Birkhoff's theorem (and of its exten¬ 
sions) is that the conditions are not asymptotic—they have to be satisfied for 
every n and not for n —» oo—while the conclusion is an asymptotic one. Let us 
only mention that more satisfactory ones, at least from this point of view, 
which contain the previous ones, can be found. 

“For about fifteen years Khintchine, Kolmogorov, Wiener, Yosida and Kakutani, 
F. Rieaz, worked to simplify the proof of this theorem. It is only lately that its domain 
of validity has been extended by Hurewicz, by Halmos, and by Dunford and Miller See 
also a forthcoming paper by the author in the Proceedings of the Second Berkeley Sym¬ 
posium. 



LIMIT THEOREMS 


337 


The bird’s-eye view above of the SCLP shows that this problem is only in a 
tentative stage, perhaps because no adequately powerful methods or no ade¬ 
quately general approach to the problem had been found until now. 

REFERENCES 

[1] G. Bawly, tlber einige Verallgemeinerungen der Grenzwertsatze der Wahrschoin- 

lichkeitsrechnung,” Rec. Math. Moscow, Vol 43 (1936), pp 917-929 

[2] J. Bernoulli, ArS conjectandi, 1713. 

[3] S. Bernstein, “Sur l’extension du th^oremc limite du caloul des probability aux 

sommes de quantity dtpendantes,” Math Ann., Vol. 97 (1927), pp. 1-59 

[4] G D Birkhoff, “Proof of the ergodic theorem," Proc Nat. Acad Sc. Vol 17 (1931), 

pp. 650-665. 

[6] E. Borel, “Sur les probability d6nombrables et leurs applications arithindtiques,” 
Ctrc. Mat. d Palermo , Vol, 26 (1909), pp 247-271 

[6] F P. Cantelli, “Sulla probabilita come limite della frequenza,” Rend Accad. d. 

Lmcei, Vol 26 (1917), pp 39-45. 

[7] A. De Moivre, Miscellanea antilytica, 1730. 

[8] W. Doeblin, “Sur les propriety asymptotiqueB ... de chaineB simples,” Bull Math. 

Soc Rourn Sc (1937), pp. 1-120 

[9] W. Doeblin, “Sur lea sommes d’un grand nombre de variables aldatoirea mdtipen- 

dantes,” Bull. Sc. Math., Vol 63 (1939), pp. 23-64. 

[10] A. Dvoretzky, "On the strong stability of a sequence of events, Annals of Math. 

Stat , Vol. 20 (1949), pp 296-299. 

[11] W Feller, “Uebei den Zentralen Grenzwertsatz der Wahracheinlichkeitsrechnung,” 

Math. Zeil., Vol 40 (1935), pp. 621-539; Vol. 42 (1937), pp 301-312. 

[12] W Feller, “Ueber das Geaetz der grossen Zahlen,” Acta. Litt Ac Scient. Vol. 8 

(1937), pp. 191-201. 

[13] W. Feller, “The general form of the so-called law of the iterated logarithm,” Trans 

Am Math. Soc., Vol 54 (1943), pp. 373-402. 

[14] B. Gnedenko, “On convergence of distribution laws of independent random variables 

(in Russian), Dokl Ak. Nauk, Vol. 18 (1938), pp. 231-234 

[15] B. Gnedenko, “On the theory of limit theorems for sums of independent random 

variables” (in Russian), Bull Ac Set. URSS (1939), pp. 181-232, pp 643-647 

[16] A Kolmoqoroff, “Ueber die Summen durch den Zufall bestimmter unabhangiger 

Grossen,” Math Ann , Vol. 99 (1928), pp 309-319,102 (1929), pp. 484-489 

[17] A. Kolmoqoroff, “Ueber das Gesetz des iterierten Logarithms," Math. Ann , Vol. 101 

(1929), pp 126-135 

[18] A Kolmoqoroff, “Sulla forma generate di un processo stocastico omogeneo,” Rend 

Accad d Lmcei, Vol. 15 (1932), pp. 805-806, pp 866-869 

[19] A Khintchine, “Ueber einen Satz der Wahrscheinlichkeitsreehnung,” Fund Math., 

Vol 6 (1924), pp. 9-20 

[20] A Khintchine, “Zur Theone der unbeschrankt teilbaren Verleilungsgesetze, Rec 

Math. Moscow, Vol 44 (1937), pp 79-119. 

[21] P S Laplace, Traitb des Probabilities, 1812, 

[22] P LfivY, Calcul des Probability, Gauthier-Villars, 1925 ^ 

[23] P Lfivy, “Sur les series dont les termes sont des variables 6ventuelles ind6pendantes,” 

Studia Math Vol 3 (1931), pp 117-155. 

[24] P L3vr, "Sur les integrals dont les 615ments sont des variables aWatoires mde- 

pendantes,” Ann d ScuolaNor di Pisa, B (1934), pp 337-366 

[25] P. LAvy, “Propriety asymptotiques des Bommes de variables aldatoires md^pendantes 

on enchain4es,” Jour. Math. Pares Appl , Vol 14 (1935), pp. 347-402 



338 


M LOEVE 


[26] P. LrfivY, “La loi forte ties ginnds nombres pour les vnnahles nldatoires enchaindes,” 

Jour. Math Pures Appl , Vol, 15 (1936), pp 11-24 

[27] P Lisvy, Th&orie de I’uddihon des variables alealoires, Gautliier-Villers, 1937 

[28] A. Liapounoff, “Sur une proposition de la thdorie des probability,” Bull. Acad 

Set, St. Petersbourg, (5); Vol. 13 (1900), pp 350-386 

[29] A Liapounoff, “Nouvelle foime du thdorfemo sur la bmite de probability, 11 Mem 

Ac. St. Petersbourg, (8) Vol 12(1901). 

[30] J. Lindebehg, “Bine nene Heileitung des Exponentialgesetzes ra der Wahrschein- 

lichkeitsrechnung,” Vol 15 (1922), pp 211-225 

[31] M LoirvE, “La tendance centrale pour des variables algatoires lides,” C. R. Ac. Sc. 

Paris, Vol 212 (1940). 

[32] M. LobvE, "Etude asymptotique des sommes de vanables aldatoires lides,” Jour. 

Math Pures Appl., Vol. 24 (1945), pp 249-318 

[33] M LobvE, “Sur 1’dquivalence asymptotique des lois,” C. R Ac. Sc. Pans, Vol. 227 

(1948), pp. 1335-1337. 

[34] M. LoiiVE, “On the Central Probability problem,” Proc Mat. Acad Sci., Vol 35 

(1949), pp. 328-332. 

[36] M LobvE, “On sets of Probability laws and their limit elements,” Statistical Series, 
Vol, 1, pp. 53-88, University of California Press, Berkeley, California 

[36] J Marcinkiewioz, “Sur les fonetions inddpendantes,” Fund Math., Vol 30 (1938), 

pp 349-364. 

[37] A Markoff, Calculus of probabilities (Russian), 1913. 

[38] Poisson, Recherches sur la Probability des Jugements 1837, pp. 205-207. 

[39] D RaIkoff, “On connection between the central limit theorem of probability theory 

and the law of large numbers,” Bull. Ac Sci URSS, Ser. Math , (1938), pp. 
323-338 

[40] P. L. Tchebicheff, Oeuvres, Vol I, pp. 687-694; Vol II, pp. 481-492. 



A RANDOM VARIABLE RELATED TO THE (SPACING OF 
SAMPLE VALUES 

By B. Sherman 1 

University of Southern California 

1. Introduction and summary. Let x be a random variable with continuous 
distribution function F(x). Then y = F(x) is a random variable uniformly dis¬ 
tributed over [0, 1], If Xi, x%, ■ • ■ , x n is an ordered sample of n values from the 
population F{x) then , y 2 , ■ • , y n {y, = F(x l )) is an ordered sample of n 
values from a uniform distribution over [0, 1], For n large it is reasonable to 
expect that the y, should be fairly uniformly spaced. Measures of the deviation 
from uniform spacing can be devised in various ways. Thus Kimball [2] has 
studied the random variable 

where x& = — °° and x n+ i = + «, conjecturing that a* is asymptotically nor¬ 
mally distributed. Moran [3] has studied the random variable 

n-|-l 

0 = E (F(x,) - F(z_i)) 2 , 

i-1 

which differs from a only by the quantity '—2 /(n + 1) + (n + l) -2 , and has 
proved that P is asymptotically normally distributed. Somewhat related to these 
two random variables is the quantity w 2 introduced by Smirnoff [4]. This is 

= n jf (F(x) - F*(x))* dF(x), 

although it is slightly more generally defined in Smirnoff’s paper. Here F*(x) 
is the sample distribution function ([1], page 325) of a sample of n values from 
the population with continuous distribution function F(x). The variable « may 
be written ([1], page 451) 



(2i — l)/2n is the midpoint of the interval ((i — 1 )/n, i/n). Thus, if [0, 1] 
is partitioned into n equal subintervals then w measures the deviation of the 
sample values yi = F(x,), i = 1, 2, • ■ • , n, from the midpoints of these in¬ 
tervals. Smirnoff has investigated the asymptotic behavior of a" obtaining a 
rather complicated non-normal asymptotic distribution. 

1 1 wish to thank Professors J. W. Tukey and S. S. Wilks for their helpful suggestion 
and criticism. 


339 



340 


B. SHERMAN 


It is possible to construct a definition of deviation from uniform spacing which 
permits a broader investigation than these random variables. This is 

F(xJ - F(x,^) - —, 

71 1 

where again % = — co and .-r n+1 = + M and F{x) is a continuous distribution 
function. (In Theorems 3 and 4 it is assumed additionally that F'{x) exists and 
is continuous except for a finite number of points). It is to be noted that 

0 £ u. £ 1. 



Generally speaking use of the absolute value in circumstances like this is an 
undesirable procedure, but it turns out that co n is relatively easy to handle, al¬ 
lowing a fairly simple calculation of its moments (which aie independent of 
F(x)). These are (jj, = min ( k , n )) 



Thus in particular the mean of a>„ is 



and the variance is 


D 2 W) = E(u\) - E\u n ) 


2n n+i + »(» - l) n+2 
(n + 2)(n + 1)"+ 2 


n 

n + 1 


Zn+2 


2e - 5 1. 
.-- — 

e 2 n 


These results will be established m Theorem 1. From the moments the charac¬ 
teristic function of may be obtained, and indeed in finite terms From the 
characteristic function the distribution function of w„ may be readily calculated. 
The distribution function is written out explicitly at the end of Theorem 1. 

To determine the asymptotic distribution of the standardized variable 

— F(o> n ) 

“ DW ’ 

it is sufficient to examine the behaviour as n > °° of the moments of this variable 
or equivalently the moments of the variable 

For it is easy to show that if the moments of the standardized variable approach 
the moments of a unique distribution function F(x) then the distribution func¬ 
tion of the standardized variable approaches F(x). In this manner it is proved 



SPACING OF SAMPLE VALUES 


341 


in Theorem 2 that the distribution function of the standardized variable ap¬ 
proaches normality. 

Since the asymptotic distribution of the standardized variable 


u n E(o> n ) 

DM 

is known it may be used as a test for goodness of fit if the number of sample 
values is large. Thus suppose x 2 , x 2 , • • ■ , x n is an ordered sample of n values 
from some population and we wish to test the hypothesis that the population has 
the distribution function F(x). Then we calculate the quantity 


1 

DM 



F(x,) - F(xM - ~ 

n -j- 1 


- EM 


Xn, 


and if this quantity exceeds a certain value which depends on the level of sig¬ 
nificance at which we are working we reject the hypothesis. Let us say that 
P (X n > A) = B. The probability of rejecting the hypothesis when it is indeed 
true is then precisely B and this is small if A is sufficiently large. But suppose 
that the hypothesis is false and the sample values come from a population whose 
distribution function G{x) ^ F(x) Then we would desire the following property 
to hold for the random variable X n , namely, for any fixed positive A the prob¬ 
ability that X n exceeds A approaches 1 as n —> «?. For in this case (and when n 
is large) we are almost certain to reject the null hypothesis when it is false. A 
test for goodness of fit which satisfies this criterion, i.e where the probability 
of rejection approaches 1 as n —> oo when the null hypothesis is false, is called 
consistent by Wald and Wolfowitz [5]. We wish to prove then that the test for 
goodness of fit which uses the random variable X n is consistent. To express the 
matter formally we wish to prove that (the probability density element of 
xi, x 2 , ■ ■ ■ , x n is n\ dG(xi) dG(x 2 ) ■ dG(x n ) in the region 

— oo < Xi < x 2 < <i n <+ 00 


and zero outside that region). 



where Di is the domain 


if F(x) = G(x), 
if F(x) ^ G(x), 


— & < xi < Xi < • * < < -f- «=, 


1 

DM 



F(x t ) - F(xM - ^ 



> A 


The first assertion here is proved in Theorem 2. The second assertion is equivalent 
to proving that for any fixed positive A 



342 


B. SnBBMAN 


where Di is the domain 


■j n-t-i 

EM - ADM < 5 Z 

Z t -i 


— co < Xi < X 2 < 

71+1 . 

F(x.) - F(x,_i) - 


< X n < + oo, 
1 


n -(- 1 


< EM + ADM , 


when F(x ) ^ G{x). Now Z)(«„) is of order n _t/z , S(to„) = e _1 + terms of order 
rT 1 and A is fixed. Hence it is sufficient to show that, if x t , x 2 , • • • , x„ is an 
ordered sample of n values from a population with distribution function G (x), 
then the random variable 





F(x t ) — Fix ,-0 


1 

71+1 


(it is necessary to draw a distinction between u n and fi tl since F(x) ^ G(x)) has 
a mean L n —> L e~ l and a variance Z)"(0„) —7 0. For then we have, when n is 
large enough so that the interval 


[E(fil n ) — AD( COn), E(cil n ) + AZ)(o) n )] 

falls outside [L — § | L — e _1 1, L + a | L — e -1 1] and | L n — L \ < 11 L — e -1 1, 
P(EM ~ ADM < 0, < EM + ADM) 

2= P(\ Qn - L I £ i I L - e" 1 1) 

Ss P(\ Q n - L n | £ i | L - e' 1 1) 

E (| - LJ) ^ D(M 

~ 11 L - e-M - \ | L - e-M’ 

and this implies (0.1). 

But now in Theorem 3 it is shown that the mean of the random variable fi„ 
is (writing fc(x) = GF~ 1 {x), fc(x) a monotonic function such that 7c(0) = 0 and 

m = i) 

r + [ i -*(* + -rh) +M, >] 

This expression approaches 

f dx 


and this integral can assume the value e -1 , which is its minimum relative to the 
class of monotonic functions such that fc(0) = 0 and fc(l) = 1, only when k(x) = x 
i.e. F{x) = G(x) Finally in Theorem 4 we prove that H 2 (f2„) —» 0 and thus it is 
established that the test for goodness of fit based on X n is consistent. 

2. Moments and asymptotic distribution of u n . 

Theorem 1. Let Fix ) be a continuous distribution function If x i, x 2 , ■ ■ ■ , x„ 
is an ordered sample of n values from the population whose distribution function is 
Fix) then the random variable 

F(x.) - FixM - —L- , 
n + 1 




SPACING OF SAMPLE VALUES 


343 


where = — °° and x„ + i = + 00 , has the moments 

T§(:t 0 C 7 


Oink 


where y. = mm (fc, n). 

The probability density element of the *, is ([6], page 90) 
n 1 dF(x x ) dF(x 2 ) • ■ • dF(x„) 

in the domain D*: — 00 < < x 2 < • ■ ■ < o;„ < + a> and zero outside of this 

domain. Then 

a„ k = n< J ■ • ■ J to* dF(zi) dFfe) • • ■ dF(x„). 

D: 

If we make the transformation yi = F{x % ), i = 1, 2, • • • , n, then 


Oink 



Vi - 2/t-i - 


1 

n + 1 


k 

dyi dy 2 ■ ■ • , 


where D„ is the domain 0 < y x < y 2 < ■ ■ • < y n < 1, thus indicating that the 
moments of w„ (and therefore also the distribution function of a>„) are indepen¬ 
dent of F(x). Here y 0 = 0 and y n+x = 1. The transformation 


ui - yi, yi = ui, 

ui = y 2 — 1/11 1/2 = Ui + u 2 , 


w» = y„ — Vn—i, Vn = «i 4- W 2 + ••+«», 

U„ 4 1 = y;+i - y n , J/n+l = Ui + W 2 + • • ■ + + Wn+1 = 1 , 


whose Jacobian is 1, then yields 


Otnk 


= ”'/-/[Is 


Ui — 


n + 11 


+ l 


n + 1 


— (ui u 2 u, 


•>!J 


dui • • • du„, 


where D u is the domain ^ u, < 1, u, > 0, i = 1,2, • • -, n. 

>-i 

The domain D u can be regarded as the union of 2" + —2 subdomains in the 
following way First the hyperplane u x + u 2 + ■ ■ ■ + u n = n/(n + 1) divides 
thel domain into two parts. In the part of the domain below the hyperplane, 
i e where u x + u 2 + • + u„ < n/(n + 1), we have a subdomain defined 

by the statement: k of the variables u, are greater than (n + 1) and the 



344 


B. SHERMAN 


residual group of n — ku t are less than (n + 1) \ There are such subdo¬ 
mains and it is clear that, because of the symmetry in the u x , the mtregal of 

- T\ u t -— over each such subdomain is the same There are alto- 

[2t-\ ^ n + 1 _ 

gether 22 = 2” — 1 such subdomains. 7c ^ n because of the inequality 

«i 4- iM + ■ ■ • + u„ < n/(n+ 1). In the part of the domain above the hyper¬ 
plane 

Mi + U 2 + ■ + u n = n/(n + 1), 

i.e. where Mi + w 2 + • • -j- u n > n/(n + 1), the reasoning is exactly the same 
except that here L + 0. Thus we may write 

“•* - B! § C”) / ■ • 7 [,£, (iTFi - ”')]* du ' *'* 


+ Bi s (;)/•• 7 [§(“■- 

£>r 2 

1 ' 
71+ l, 

\ “1* 

1 dui dui • • 

• <2 m„, 

where D rl is the domain 




n ^ 

M t <c III M, > 

,-i n -(- 1 n 

1 

+ 1 

(7 -1,2,- 

■ •»r), 

0 < w, < — 

n 4- 1 


(t = r + 1, • 

* 1 > n), 

and D r2 is the domain 




n 

, < 22 M, < 1, M, > 

n + 1 >-i 

1 

n + 1 

(i = 1,2, • 

• ‘ , r), 

0 < M, < - 

n + 1 


(i = r + 1, • 

•• ,ri). 

If we introduce the variables 




1 

Z, = Ui - —- 

n + 1 


« - 1,2, ■ 


1 

Zi = —- u, 

n + 1 


(i = r + 1, • 

•', n), 

we get 




“■‘ = n! 5(?)/■• 7(.£,*') *>■■■*• 

^rl 





n! S(r”)/-'7(§«) **’ 



SPACING OP SAMPLE VALUES 


345 


where A r i is the domain 


J2 Zi < E 2 ,, z, > 0 

i™l t“r+l 


Mr 1 


> z, > 0 


(i = 1,2, • • • ,r), 


(i = r + 1, • • •, n). 


and A r 2 is the domain 


nr 1 " 

E z, <Ez, < —— + E z„ z< > 0 (* = 1,2, ■■■,?), 

i—r-j -1 U X t—H -1 


n + 1 


> z« > 0 


{i = r + 1, ■ • •, n). 


To effect the integrations with respect to the variables zi , z 2 , • ■ z r we take as 
volume element in the r-space of zi, z 2 , ■ ■ z r the volume between the hyper¬ 
planes Zi + Si + ■ • ■ + Zr = C, z, > 0 and z x +• z 2 + • • -f z r = C + dC, 

Cf C -1 

z, > 0. This volume element is d — = r dC. Thus 

r! (r — 1) i 


n ^M r lh+1 f 1,n+l r f'J+i 1 ' 


-5 Of 
+ '‘ i §C“)i 


/ /” 

Jo LJO 


(r ~ 1) I 


; ] (.!/•)■ 


dZr-j-l * * * C?£n 


1/71+1 aI/M- 1 f r (l/n+l)+ 2 *1 ^fc+r-1 -1 

l [/* .. 

*-r + l 

pl/n +1 i 

/ —J (Zr+1 + • ’ - +Zn) i+r dZr+1 ’ ‘ 1 (fen 

Jo r! 


= »< E f n ) f 1/n+1 • • ■ f n+1 

r-0 V / JO JO 

+ n!E( n ) f 1/n+1 -../ 1/B+1 

r-i \r / Jo Jo 


(A; + r) (V - 1)! 

/ 1 V +r 

( J + Zr+l + ■ ■ ’ + Zn ) dZr+1 • ’ ‘ dz n 


~nl±( n ) / 1/n+1 --- ( 1/n+1 

r-l V / Jo Jo 


t=i\rJJ 0 Jo (fc + r)(r-l)l 

' (Zr-f-1 4” ~i“Zn) dZr-j-l ' ' ' dZn * 

In order to perform these integrations we use the formula 


f ■ • ■ f (B + x\ + £2 + ■ ■ ■ + Xn) m dx 1 • ■ - dx n 
Jo Jo 


m! ^ 
(m + n) 1 1 


E (-D n - q (") (B + qA) m+n , 

0-0 \q/ 



346 


B. SHERMAN 


which is established immediately by induction on n. Then 


n—1 n—r 


, v' V' (“ !)" ' * ( k + r ) ! f n \ ( n ~ A ( Q \ 


n+k 


r\ (n + k) 

^~ q (k + r - 1) 




(r - 1)! (n 4- Ic) 


rOCrXte)" 


f=i PA (r - 1)! (» + k )! 

The first of these double sums is equal to 

n! /c! 

(n + 7c) 


WDter 


n+k 


■rtete;)te)Cr)te)" 

- C i T § (;) te)" + ‘ [§ (”; •) C t ')]■ 

Let us assume first that n Si k. The expression within the brackets is the coef¬ 
ficient of x n ~ q in (1 — £) n-a (l/(l — x ) 4+1 ) = (1 — a;)" -5- * -1 and this is only 

when q § n — k and then it has the value ( ^ Y 

\» ~ 5/ 

equal to 


Thus the first double sum is 


(n+ky f / k \(n\( q Y +k 
\ k ) \n - q) \q) \n + l) 

=CtTsC)(:)ter 


Similarly the second double sum is equal to 

v-i h~l 


ctrgcrx. :0ter 

and the third is equal to 

ctf steteter 

Thus, using the identity 

oorroi^o-c-oorrocT 1 )' 


we get 


£*n k 


_ (n + fc\ -1 /n + 1\ (k - l\ /n - A B+ * 

\ k ) U \s + 1/ \ * / \iT+V ’ 


If however k > n then a similar argument shows that we get an expression for 
« n it which differs from the above only in the upper limit of the summation, which 
is n — 1 in this case. Thus the theorem is proved. 



SPACING OF SAMPLE VALUES 


347 


The distribution function of w„ is 

71—r—1 

F(x) = 1 + 


£ £ (-I)"* 


n + 
q + 


0 


C ,+ r , )fcri)'(^--n 


+ i 

where r is the non-negative integer determined by the inequality 


n + 1 


^ x < 


r + 1 

71+1 


F(x) = 0 when x ^ 0, F(x) = 1 when x ^ n/(n + 1) and F(x) is a polynomial 
of degree n m each of the intervals 


( i ~ 1 

\tI + 1 ’ 71 + 1/ ’ 


= 1 , 2 , •• 


Theorem 2. The random variable «„ fs asymptotically normally distributed 
(E(u„), D(co n )); i.e., the distribution function of the standardized variable 


oi„ — E(w„) 


DM 


approaches 


•\/ 2ir J- a 


dt. 


It is sufficient to prove that the moments of the standardized variable approach 
the moments of the normal distribution. For in general it is known that if the 
moments a nk of FJx) approach the moments a k of a uniquely determined dis¬ 
tribution function F(x), then F n (x) converges to F(x) m every continuity point 
of the latter (M. G Kendall, Advanced Theory of Statistics, Vol. 1, Third edi¬ 
tion, Charles Griffin and Co., 1943, pp. 110-112). 


Now E(u n ) 


and H 2 (w n ) 


2e — 5 1 


c 

n 


so that the two vari- 


ables 6) " and 

Z)(oj n ) 


e 2 n 

— - J have the same limiting distribution. Thus 
■Ls\w nJ \-/ \ e/ ,y/ l\ 

it is sufficient to prove that the moments of - -J tend to the moments 

of the normal distribution. In the following argument we take p = k since n -*■ °°. 

[(r(--9xrs(i)“Hr 

n ml2 m\ r<-«: j. V V _ 

“ (2e — 5) m/2 L ml (n + k) \(m - k) 1 




(2.1) 


(—l)" 1 ' - * Tile* 



348 


B. SIIEHMAN 


Suppose now that it has been proved that E 


iv m l 

8/ J 


tends to a 


finite limit as n —> «>, i e., that the limiting moments of order 2m exist, 
m = 1, 2, ■ • .If m is odd 


E 


©“(-mi 


Hence, if in is odd, E 


m/2 / 


1\"1 
£0n — - I 

8/ J 


is bounded as n —> oo. Now the ex¬ 


pression in the bracket on the right of (2.1) can be expanded in a convergent 
power series in n~ l provided that n> m. Because of the factor n”‘ n and because 

the left hand side of (2 1) is bounded as n —► oo this power series must have —, 

nr 

where p S - (since m is odd), as its initial non-vanishing term But then 
£ 

the left hand side of (2.1) must approach 0 as n —► °°. Thus if the limiting mo¬ 
ments of even order exist the limiting moments of odd order are zero. We may 
now restrict the discussion to even order moments. 

Replacing m by 2m in (2.1) 

v [M m / _ lV m l = n”(2m) 1 T 1 

L\c/ \ e) J (2e — 5) m |_(2m)! 

(—l) *mle t /n -f l\ /7c — 1 \/n — s\ n+A: ’] 

Im — fc) I \s + 1/ \ s /\n +1/ J' 


2m Ar—1 

+ EZ 


i—i ,-o (ra + k) !(2m 

Let us introduce the index q = k — s — 1 which runs from 0 to 2m — 1. 
Then 


E 


Y-Y ( - YT1 = « m ( 2 m)i r i 

\cj V” e) J (2e - 5) m L(2m)i 


2m—1 2m 


til 

g-0 k - Q+1 (,11 + Iz) 


) k n\e k /n + l\ /fc — l\ fn 
!(2m - fc)! \fc - q) \ q ) \" 


fc + g + l V + *~ 
» + 1 / _ 


n m (2rn)! 
(2e — 5) m 


, Gt2 . 


n 


n c 


i t ^m-fl j 

oo, 171 /wtn-l-l 




In order for lim E 


* « 0 , 1 , 2 , 


m 


[©VOI ;o exist it is necessary to show that a, = 0, 

- 1. Then lim E [Y-Y L n - -V” 1 ] = , If W e de- 

L\C/ \ €/ J (2e —5) m 



SPACING OF SAMPLE VALUES 


349 


termine the coefficient a, q of n * in the expansion in powers of n 1 of 
^ (-l) l n!e l (n + l\ (k - l\ 


2m ( 

V _ \ 

K-a+l ( n + 


we will then have 


— 1 Yn\e k (n + l\ (k - l\ 

7c)'(2m — k) \ \7c — q) \ q ) 

(n - k + q + l\ n_K _ v a.« 
\ n + 1 / ^ n’ ’ 


7 

= /Lj Gjg1 


j = 1, 2, • ■ m. 


It can be established at once that a 0 = 0 For if we set g = 0 in (2 2) and let 

*2^ riH 1 

n —» oo then (2.2) has the limit = —7H~n • To determine the 

v (2m— k) 17c' —(2m)! 

expansion of (2,2) in powers of n -1 it is sufficient to focus attention on the expan¬ 
sion m powers of n~ l of 


(n + 7c) 


(n + 1) (n) • • • (n - 7c + q + 2) ^ ^ ^ + ^ 


_ (n + 1) (n) • • ■ (ft — 7c + q + 2) / n — 7c + g + l V 
(n + 7c) (n -f 7c — 1) • ■ ■ (n + 1) \ n + 1 / 

or equivalently on the expansion in powers of x of the function 


I) -0 


k + q + 2 




(; + t )0 + *- 1 )''(i + 1 )\ i + 1 


7c + q + 1 


_ x s (1 — rc) (1 — 2x) • • • (1 — (7c — g — 2)x) / l — (7c ff l)x \ 
OTF^Kl + 3s) • • ■ (1 + kx) \ l + x J 

= x v (a kq o + a kg ix + a kl ix 2 + • • •) = x’F{x). 

Here dkqo = e~ k+q and the other coefficients may be obtained by a recursion 
formula. Thus: 

a ktP = DiZlF(x) = [F(x)D log F(x)} 

p 1 p! 

= - Z ( p ~ lS ) £*V _1) F(x)D£P log F(x). 

p! o \ « ) 

But 

log F(x) = Di *4 1) [Q + 7c^ log (1-0 k - q - 1)*) 

_ A + k) log (1 + x) + £ lQ g C 1 - “ 2 lo S 

\x / i-*- * J 


= 8 \\(k- q~ D a+1 - 27c + g + l) 

*- ' - n 1- 


/ 1 \ k-q—1 k 

(_!)■ (l - fc - s — 2 ) - § *' + ‘ " § (" 


l)’z S+1 = S ' b k q, , 



350 


B. SHERMAN 


so that 

&kqj) “ J~7 ( r, ) (P ® l) 1 «—1) 5 !&*:$* = — (p—*— ifokq* . 

Of bt,, we need merely notice that it is a polynomial in k of degree s -f- 2 and 

that 6*40 = — | k l + Ak + B, where A and B depend on q only. We wish to 

determine the value of a*,(,_ 0 ) and to this end we solve the system of linear 
equations 


GfcgO 

P-1 


= e _t+4 , 


^ P-1 

y . flfcocp—■—i) 6 fc 4 j dkqp 0 ) p = 1 > 2 , ■ • • ,x o. 

p .-0 

o* 5 (,- c ) is therefore a quotient of two determinants. The determinant in the 
denominator has the value (— 1) <_4 while the determinant in the numerator 
can be expanded by its last column and is therefore the product of (— 1 ) ,— °e — 
and a determinant 2 ?*,, wnose entries d a p , a, 0 = 1 , 2 , • ■ • i — q, can be de¬ 
scribed as follows If p > a + 1 then d„p = 0. d„( a +i) = — 1 and when /3 £ a, 

dap = - 6jm«-0) , a polynomial of degree a — /3 + 2. Thus a* st ,_ 9 ) = e~ k+ "B kqi . 

CL 

The determinant Bk t i is a polynomial of degree 2(t — q) in 7c and the term of 
this degree comes only from the product of the diagonal elements. For 

«—g 

Bkv = I dap | * s i IX d«<«) where cr(a) £ <x + 1 and (<r(l), cr( 2 ), • ■ <r(i - q)) 

a—1 

is a permutation of (1, 2, — • i — q). The term IX d a <r(a) haa degree 

Q-l 

Z (“ “ cr(a) + 5(a)) = Z 5(a) where 5(a) = 2 if u(a) £ a and 5(a) = 1 


a “l 


a-1 


if cr(a) = a + 1. But 53 5(a) = 2(i — q) *-f 5(a) = 2 *-* cr(a) £ a <-+ cr(a) - a, 

a»l 

so that it is the product of the diagonal terms and only that product which gives 
to the term of degree 2 (i — q) in the expansion. Thus 

jB» 4 * = t—-— r; (5jt 5 o )’ -3 + terms of lower degree in fc 
(*-?)! 

-*4n(-0 s 

We are now in position to evaluate a, q . 


a >o ~ 53 


(-D 


*- 4 +i (2m — k)\(k 


V (k - i\ 

i\ a r* a( ’- 


= z 


(-DV (h - A o 

,-4 Ti - fc)l(fc - 3 )! V 5 / 5< 

e 4 /_ 5V _S ^ (—(k - l\ 

0 — g) 1 \ 2 ; *4+1 (2m - fc)!(fc - 9)1 V 9 / 


+ z 




*- 5 +i ( 2 m — k)\(k 




(2.4) 



SPACING OF SAMPLE VALUES 


351 


To complete the evaluation of a 1(Z we observe that 

* (~l) k k l (k-l\ if l = 2(m — g), 

(2 ' 5) A ( 2 V q ) = 9 

10 if l < 2(m — g). 

(2 5) implies that a, q — 0 if i <c m and therefore a 3 = 0 if j w The proof 
of (2.5) is hrief. Wo note that l; 1 = E c, ^ , where c, is independent 

of l and ci_i = (l — 1)!. Then 


^ (-l)V 

* 4+1 (2m - k) !(/c 


= E 


_A - !\ ^ (-l)'-fc ! - l 7ci 1 ^2m - q\ 

2 / i4+i qi(k - q - 1)1 (2m - q)'\k - g / 

(T--;)cr) 

$&<-»* (?-,*) gw.)]- 


J-t 2m , . 

-EE (-!)*?=-nn#- k 

j-o i- 4+1 ( 2 m — g)!g!(fc — q — 1 ) 


c,(j + g + 


Po (2TO - g)!y! 

The expression within the brackets is the coefficient of in 

(1 - aO 2 


1 


5 (“I. = (1 — x) 2m 2,1 ; 2 and this is zero if j < 2(m — g) — 1 

and 1 if j = 2(m — g) — 1. Accordingly 

fs _ (-DV 

* 4+1 (2m — g) 1 (/c 

'0 


:_ -') 


[2(m - g) ~ l]l[2(m - g) - 1 + g + 1]! _ _1 
(2m — g)![2(m — g) — l]lgl g 1 


if l = 2(m — q), 
if l = 2(m — g), 


and (2.5) is established. Returning to (2.4), cz, a = 0 when i < m, while 


(LfllQ — 


(m - g) ig 
and now applying this expression to (2.3) 


i(-r-iCK-s)' 


e; 


Thus 


Si E [(?)* ("-" ;)’*] ' 


ml 2” 


a™ (2m) I 


(2e - 5) m . 


(2m)! 


(2e — 5) 1 ml 2™ 


and these are precisely the even order moments of the normal distribution. 

© 1/2 / l\ . 03 n — £(«„) 

(«„ — -1 is asymptotically normal and so is — J)(oT )— ' 



352 


B. SHERMAN 


The skewness /9i 
— S(w«) 




and kurtosis jS 2 = —* of the standardized variable 


DM 


are 


A - i — q, 4 - eji 70> - + o<-*-) ' 

„ , 1 24c 8 - 336e 2 + 1368<5 - 1718 
01 ~ 3 + n' (2/-"5)» 


.356 


n 


+ 0(n~ 2 ), 


+ 0(n -2 ) = 3 - 


105 


n 


+ 0(n~ 2 ). 


3. Consistency. According to previous discussion in order to prove the con¬ 
sistency of the test for goodness of fit based on the asymptotically normal 

variable ~ — it is sufficient to show that, if Xi ,Xi, • • • , x„ is an ordered 
D{ui n ) 

sample from a population whose distribution function is G(x), then the limiting 


mean of the random variable ^ 2 

2 ,-i 


F(x t ) - F(x,_i) — 


is not equal to 


71 -f- 1 

e _1 if F(x) yf G(x) and the limiting variance of this variable is zero This is 
the content of the next two theorems. In connection with these theorems it is to 
be observed that, when y = F(x) is continuous, F~\y), 0 ^ y g 1, can be 
defined unambiguously by writing F~ l (y) — [Sup x- y = F (x)] except for y — 0, 
and F~*( 0) = — ». The function k(x) = GF~\x ) is then a non-decreasing 
function mapping [0, 1] into [0, 1] and such that k( 0) = 0 and fc(l) = 1. Now 
if F'{x ) exists for all but a finite number of points and is never zero then F~ 1 (x) 
is continuous and so is k{x). If further G'(x) and F'(x) exist and are continuous 
except for a finite number of points then (F'(x) ^ 0)fc'(x) enjoys the same 
property. These remarks justify the substitutions and partial integrations that 
are effected in the course of the next two theorems. 

Theorem 3. Let F(z) and G(x) be continuous distribution functions whose 
derivatives exist and are continuous except for a finite number of points. If 
Xi, Xi , ■ • ■ x„ is an ordered sample of n values from the population whose dis¬ 
tribution function is G(x) then (fc(x) = GF~ l (x)) 


EM =Z?0 g | F(x t ) - F(xM - |) 

- r [‘ - * (’ + ;rh) +J * -* [ 

The integral / e ~ k ' (x) dx has, relative to the class of monoionic functions such that 

Jo 

/c(0) = 0 and k( 1) = 1, the minimum value e _1 and assumes that value only when 
k(x) s x i.e. F(x) = G(x). 

Let us suppose first that F'{x) ^ 0 Then F^x) is continuous and it is dif¬ 
ferentiable at all but a finite number of points as is also the function 
GF~\x) = k(x). 



SPACING OF SAMPLE VALUES 


353 


E(tU " 5 S ® (| PW _ F< *^> ~ iVU I) 

(3i) -g*(|«-JTlD + ^d 1 -«*•> - JTI|) 


+ 


\IA 


F(x,) -F(x^) - _i~ V 
n + 1 / 


The joint probability density element of Zj-i and x, is 

Yh 1 

(,- _ 2)l(w - , t) ! <«?(*_>) *?(*<) 
in the domain — <x> < a:,_i < x t < + oo and zero outside that domain. Hence 

1 r» r »o -i, I 1 

. --_ GM W (1 - G{x,)) n ~'dG{x^) dG{x t ) 

(z — 2 )!(n — z) 1 

• [1 - (?(F) + G(I)]"' ! de(X) dG{Y), 


and making the transformation y = F(F) and a; = F(X) the integral on the 
right can be written 

in(n - 1) [ [ V I y - x - - I [1 - My) + Mx)T~ 2 dk(x) dk(y) 

Jo Jq | W + 1 | 

- 1 / + —[1 - fc(v) + 2 dk(x) dk(y) 

+ n(n - 1) [ f fi/ - a--^-r) [1 - My) + k(x)] n ~ 2 dk(x) dk(y). 

J l/n+l ao \ n ~T 1/ 

Integrating partially with respect to x, the expression on the right becomes 

U -1 f (-»-+rh) 11 - wr ‘•« 

- ” n* 11- Mv) + Hx)]"- 1 dx dk(y) 

2 Jo vo 

- » f 1 (y- -J—) [1 - My)r' dk(y ) 

Jl/n +1 \ n + 1/ 

/.l »v-(l/n+l) 

+ n [ I [1 - A(i/) + Mx)] n dx dMy), 

Jl/n+1 JQ 



354 


B SHERMAN 


and now integrating with respect to y 

5 E#(W.) - Ffa-i) ~ —h I) = ~An + 5 f [! - toe)]’ dx 

+ i f k(x) n dx — f [1 — fc(x)] ft dx — [ k(x) n dx 

Z Jo J l/n+l Jo 

J /.r»/n-H r / -| \ “In 

o [i - *(* + £+-*) +*6) J dx - 

The other two terms in (3.1) are treated similarly. The probability density 
element of xi is n(l — G(xi)) n ~ l dG(x 0 so that 

■ B (I FM ~ d-i !)' 1 £ | ■ FW - I (1 - eW) "“*«■> 

= 5 f |*-;rXTl (1 - Ux))^ 1 dk(x) 

Z Jo j “t” 1 | 

-j -j pl/n-f-1 

+ 5 [ (1 — fc(x)) n dx. 

Z Jl/n+1 

Similarly we find that 

H^W-rhD-aSTTi) 

■j /*n/n-fl 4 rl 

+ x / /c(x) n dx — - \ k(x) n dx. 

Z Jo £ J l/n+1 

Thus 

j.n/ti+1 r / 1 \ "In 

^(0„) = J o 1 — k f x + J 4- 7c (m) J dx. 

This result is, however, independent of the hypothesis F'{x) ^ 0. For if F'(x) 
is sometimes zero we may select a sequence of distribution functions F„(x), 
m = 1, 2, • • • , which converges everywhere to F(x) and which is such that 
F m (x) f 5 0 . The F m (x) otherwise satisfy the conditions of the theorem. If £f m „ 
is that function of Xi, x 2 , ■ • , x n obtained by replacing Fix) by F n {x) in 0 „ 
then fl mn converges to 0 „ for every fixed set of x x , x 2 , • • • , x n and E(Sl mn ) con¬ 
verges to E{U n ) since both fi mn and 0 „ are bounded by 1, Furthermore if x 0 
is any value such that F'(x o) ^ 0 and y B = F{x 0 ) then F^{y 0 ) converges to 
F (j/o) = ■ For if X\ is a. cluster point of the set F^ l (y 0 ), then there exists, 

for a given e, a sufficiently large m such that | F(xi) — F m (x i) | < t (because 
F m (x) —i F(x)) while, for the same to, \ F m {xi) — y B \ < e because of the con¬ 
tinuity of F m {x) Thus | F(x i) — yo \ < 2e and, since e is arbitrary, 
J/o = F(x i) = F(x o). So xi = x 0 since F'(x 0 ) ^ 0 Thus F^\y) —> F~\y) for any 



SPACING OF SAMPLE VALUES 


355 


value of y such that if x is mapped into y by F(x) then F'{x) ^ 0. This set 
on the y axis however includes all y except for a set of measure zero and so 
FZ\y) -* F~'(y) almost everywhere. So k m (y) = GF^\y) -> GF~\y ) = k(y ) 
almost everywhere and 


1 — fc, 


(» + JTTl) + *-<»>] - [ : - ‘ (« + JTl) + * (! ' ) ] 


almost everywhere. Then 

jf"" +1 [l - k, (x + + M*>J it 

[ 1 -*(* + irFi) + * W ] * 

since both integrands are bounded by 1. Therefore the equality 
Ei&mn) = jf j^l - k m (x + + fc m(z) J dx 

is preserved as m —> °°. 

Now fc(x) is a monotonic function and hence has a derivative almost every¬ 
where. Then 

1 - k (x + ^ri) + k W~\ 

- t 1 - 5T1 (* (* + iTTi) " * W /5Ti)I 

converges to almost everywhere. If we write 

#»(*) = [l - fc (x + x ) + fc ( x )] 


when 0 ^ x ^ ^—- and H n ( x ) — 0 when —■ < x ^ 1, then 

n + I n i 

I‘ * -f’ + ‘ [i - * (*+rb) + «*>] * -1 e “' w dx 

as n —* oo, The curve y = 6"" 2 lies always above its tangents and the tangent at 

x = x is w = - -X + - . Thus e~* £ - for all x, equality holding only 

e e e e 

when x = 1, and therefore e - *' <3:> ^ --fc'(x) + equality holding only 

Q €t 

when fc'(x) = 1. 

So i 

f e ~ k ' (x) dx S —- f fc'(x ) dx + \, 

Jo e J» e 



356 


E SHERMAN 


equality holding if and only if fc'(x) = 1 almost everywhere. But for any mono¬ 
tonic non-decreasing function 

f k'(x) dx g fc(l) - fc(0), 

Jo 

equality holding if and only if k(x) is absolutely continuous. Hence 

f 1 e" fc ' w dx & -- f 1 k'(x) dx + -%~, 

Jo s Jo c e 

and the equality runs through if and only if k{x) is an absolutely continuous func¬ 
tion such that k'(x) = 1 almost everywhere But this is true of k(x) if and only 
if k(x) s x and this in turn is true if and only if F{x) = G(x) 

Theorem 4. The random variable O n has limiting variance zero; i.e., lim E(Qi) = 

/V*' w dx 

Jo 


As before we assume first that F*(x) 0, Then 


E(n\) = E 
(4.1) + E 


lit 


F(x ,) - Fix,-,) 


n + 1 


)] 


Fix,) 


E 


J “ ITl| a '] + ®[l 1 _ _ iTTlK] 

1 


[i( 


Fix,) - 


+ 


1 - Fix , 




n + 1 

Suppose [Sup x: /c(x) = 0] = a and [Inf x: fc(x) = 1] = b. We may then obtain 

lim E Fix i) — -—fi n in the following manner: 
n-*<® LI n T 1 I J 

1 


Fix,) - 


(4.2) 


| E [|W ~ ~ E ^ 


n + 1 


— a 


Q n 


[tf(fin)] 1/2 . 

But Sl„ g 1 so that E(ti\) is bounded as n —> oo. On the other hand 

e(f{x 0 - —- - aj - n f (Fix,) - — - a J (1 - GCn))’ 1-1 dG(x,) 

= n f (x - a — —(1 — fc(x)) 7 * -1 dk(x) 

Jo \ n 1/ 

- («+ sri) +f 2 (* - ° - iTi) a “ * w) ’ *■ 



SPACING OP SAMPLE VALUES 


357 


r 

A s n —> 00 the expression on the right tends to a 2 + / 2 (x — a) dx = 0 

Jo 

Thus the expression on the right of (4.2) goes to zero as n —» » and therefore 


(4,3) lim E 


1 I " f 1 

F(zi)-i fl» = hm £ [af2„] = a / e -,c ' 

n H- 1 I J n-»«c Jo 


Or) 


dx. 


In a similar manner we obtain 


(4.4) 
and 

(4.5) 


lim E 

n-*°o L 


1 - F(Xn) ~ 


n + 1 




= U - 6) f 

Jo 




dx 




F(x i) - 


n + 1 


+ 


1 - F{x n) 


.-tiDI 


= —i(a + 1 — 5) 2 
The first term on the right of (4.1) remains to be investigated. We have 

E [0 § I l?w - nx -‘ ) - rui I)'] 

" s £ [S ( EW - ~ »ti) ] 


(4.6) 


+ 3*[§ I«*•> iTTi11 m - Hx -' } ' Q 


+ 


i r n ~i 
5® E 

L L 4™2 


F(s t ) - F(a_i) - 


71+1 


F (x-t-i) - FM - • 


The joint probability density element of £,_i and x, is 


n! 


(t - 2)! (n - *) 1 


(1 - Gix^)Y- 2 G(xy-' d<?(x-i) <«?(*•> 


so that 


-i„(„-D // 

-■c <x<y <« 

• [1 - (7(7) + <7(X)r 2 dG(X) d(7(7) 

= - n (7i - i) t T (y ~ x —in') [1 ~ + k( ' x ' )]n ' 2 d?c(x) dfc(y) ' 

4 Jo Jo \ 71+1/ 



358 


B. SHERMAN 


In this latter double integral we integrate first with respect to x and then with 
respect to y obtaining 

igVlT." S t ( y ~ rb) 11 - m] ' d,J ~ 5 f CfTT" *) *«’'* 

+ | JJ [1 ~ Hy) + k(x)] n dx dy, 

0 <*< 1/<1 


and proceeding to the limit 

—sX * * - 1 / a -*)«**+| // 


dx dy 


(4.7) 


0<*<K<1 

fc(x)«/c(l/) 


= - ^ a 2 - i (3. - 5) 2 + i dx dy. 


' 0 < I < 1><1 

The joint probability density element of x<_i , x,, x/_i, x, when ,7 > i + 1 is 

[1 - GMl-'MM dO(x,) dG(x,-ii iiah.l, 


SO 


F(x t ) - F(x_0 - 


-* r n—2 n 

hz ^ 

Z L l “2 j-»+2 

= i n(n - l)(n - 2)(n - 3) ffff | F(F) - F(X) - 


n + 1 


:y) )TFl |] 


(4.8) ■ 


F(7) - F(0) - 


o<x<r<i/<v<i 

[1 -G(V) + G(U) 


v — n — 


n + 1 

- G(Y) + G(X)] n ~* dG(X) dG{Y) dG(U) dG(V) 

= ^w(n — l)(n — 2)(ra'— 3) //// | y _ ai - _L_ 

0<X<K<U<X<1 

• [1 — k(v) + k(u) — fc(y) + k(x)} n ~ i dk{x) dk(y ) dk{u) dfc(w). 
The joint probability density element of x,_i, x,, x,+i is 
nl 


n -f- 1 


(i — 2 ) !(n — i — 1 )! 


(?(x^r*[l _G(x l+1 )] n — 1 dG(x^) dG(x t ) dGM 



SPACING OF SAMPLE VALUES 


359 


and so 


■ E 


Li-2 


Fix.) - F (*,_!) - 


1 


n + 1 


F(x*+ 1 ) - F{x l ) - 


-i»(»-l)(n-2) /// 


w + 1 


(4.9) 


0<i<k<f<i 
1 


F(V) - F(Y) - | [1 - G(V) + G(X)] n ~ 3 dG(X) dG(Y) dG(V ) 

-i»(«-l><»-2) /// | , - * - 


V - V -r-r 

» 71+1 


0 <I<1Z<1I<1 

[1 — k(v) + kCr)]" -3 dk(x) dk(y) dk(v) 


We introduce the symbol >S(p, g) as follows 


s(v,q) = 


-1 


if q ^ V + 
if q > p + 


71+1’ 

1 


71+1' 

Then in the integral on the right of (4.8) we perform a partial integration with 
respect to u and add to the integral on the right of (4.9) We get 


\ n{n - 1) (n - 2) JJJ 


n + 1 


y - x - 


0<I<V<11<1 

■ [1 — h{y) + fc(x)r 3 dk(x) dk{y) dk{v) 
-1)(»- 2) JJff S(u, v) 


n + 1 


y - x - 


n + 1 


0<X<V<U<V<1 

■ [1 - k(v).+ k(u ) - k(y) + k{x)} n ~ 3 dk(x) dk(y ) dk{v) du, 

and now integrating with respect to v in the triple integral and performing par¬ 
tial integrations with respect to x and collecting terms the sum of (4.8) and 
(4.9) becomes 


n(n — 1) n(n — 1) f 1 

4(n + l) 2 2(n + 1) J 0 


V 


71+1 


[i - mr' dk( y ) - 2n ^ + x l) 


Jj S{x, y) (1 - k(y) + k(x)] n 1 dxdk{y) + § 71(71 - 1) 


0 <x<V<l 


ffl S{u,v) 


<1<V<U<V<1 


y - 


71+1 


[1 — k(v ) + k(u) — k(y)] 


n—2 


• dk(y) dk{v) du + §n(n — 1) 

HU S( ' u > V ^ S( - X > y) [1 _ k( ^ + “ fc (2/) + Hx)V~ 2 dk(y) dk(v) dx du. 


0<X<V<U<V<1 



360 


B. SHERMAN 


Now some tedious, although in principle straightforward, calculations show 
that the first three terms of this expression approach 

(4.10) -i - la - 4(1 ~b) + ^ e~ k ' lx) dx, 
that the triple integral approaches 

(4.11) la + la(l - b) + ia - a f e~ k ' M du, 

Jo 

and that the quadruple integral approaches 

2 If e - k ' iz) - k ' w dx du - f e~ k ' M dx - (1 - fc) jf e' k ' M dx 

0<s<u<l 

(412) - \ fj dx du + (1 - b) 2 + *6(1 - 6) + i. 

0 <*<u<l 
*(x)—ft(tt) 

Thus collecting the results of (4.3), (4.4), (4.5), (4.7), (4 10), (4.11), and 

(4.12) we have 


lim E(S11) — 2 

n-toc 

0 <as< u <1 

Since the integrand is symmetrical in the variables u and x we may write 
(4.13) lim = ff e"*' (l) -*' (u) dx du = \ f e' k ' (x) dx ], 

n-»co JJ "0 _ 

0<i<1 

0<«<1 


/ j dx du. 


and this proves the theorem in the case F'(x) ^ 0 
Using the procedure of theorem 3 we may however extend the theorem to 
include the possibility that F'(x) is sometimes zero. But it must be shown 
additionally that the sequence F m (x) can be so chosen that Q mn converges to 
uniformly in n, i.e. that, for a given e, ] il mn — O n | < e for m sufficiently large 
and for any value of n If this is true then, observing that 0 ^ Q mn + fir, is 2, 
1 flnn | < 2e and 

! E(& n ) - E(St) | g S( I SC - n 2 „ I ) g 2e 


independently of n, Letting n —* « 


[ f 1 e~ im (x) dxl - lim E(nl) ^ 2«, 

n-»°o 

and nowletting m—>°° (theF m (x) constructed below are such that h' m (x) —» k'ix)) 



lim E($l) S 2t. 

n-*oo 



SPACING OP SAMPLE VALUES 


361 


Since e is arbitrary this implies (4.13), so that the theorem is extended to include 
the possibility that F'{x) is sometimes zero, That the sequence F m (x) can be 
chosen so that fl ran converges to uniformly in n can be shown as follows. The 
set of points on the x axis for which F( x) = 0 maps into a set of points on the 
y axis of measure zero For any m we may enclose this set on the y axis in an 

open set S of measure less than — $ is the union of disjoint open intervals 

Si = 1) 2, ■ ■ ■ . The sets T, = F~ (Si) on the x axis are disjoint open inter¬ 
vals Now we may construct a distribution function F m (x) which coincides 
with F(x ) outside ST,, is such that F„(x) + 0, and otherwise satisfies the condi¬ 
tions of the theorem (stated explicitly in Theorem 3). The sequence F m (x) con¬ 
verges to F(x). Furthermore 




1 71+1 

-E 

(4,14) . 1 y 

= 2 t=x 

n+1 


F(x<) - F(x,'_i) - 
F(x)) - F(x,_0 - 


1 


n - j-1 
1 

n + 1 


-1 71+1 

+ E 


F m (xi) - F m (x,_i) - 


n + 1 


F m {x i) F m (x,_i) 


n+1 


1 n+i 

^ 5 £ I iF(xi) - F(xi-i)] - (FM - F m (x>-i)\ | 

Z i*-i 


For any particular set of values of Xi, x 2 , • x n some (possibly none or pos¬ 
sibly all) of the x t will fall into intervals of the 2T,. If this finite set of intervals, 
each containing at least one x,, is say Ti, Ti, • • • , Tk, then a simple analysis 
of the sum on the right of (4.14) shows that it is less than twice the total length 

of the intervals F(Ti), F(T 2 ), • • • F(T k ) and this total length is less than ~. 

m 

Thus I fl m „ — | < - and this result is independent of n 

m 


REFERENCES 

[1J H. CeamAr, Mathematical Methods of Statistics, Princeton University Press, 1946 

[2] B F Kimball, "Some basic theorems for developing tests of fit for the case of the non- 

paramstno probability distribution function I,” Annals of Math Slat , Vol, 18 
(1947), pp 640-548, 

[3] P, A, P Moean, "The random division of an interval," Jour. Royal Stat. Soc,, Supp , 

Vol. 9 (1947), pp, 92-98, 

[4] N. Smirnoff, "Sur la distribution de u ! ,” Cornpte Rendus de l’Academe des Sciences, 

Paris, 202 (1932), p 449 

[5] A, Wald and J Wolfowitz, "On a test of whether two samples are from the same 

population," Annals of Math. Stat., Vol. 11 (1940), pp. 147-162, 

[6] S S, Wilks, Mathematical Statistics, Princeton University Press, 1943, 



ON A PROBLEM IN THE THEORY OF k POPULATIONS 1 

By Raghu Raj Bahadur 
University of North Carolina 

1. Summary. In two recent papers, Paulson [1] and Mosteller [2] have called 
attention to several unsolved problems in fe-sample theory. A problem which is 
typical of the ones considered in this paper is as follows. 

Let n, n, • ■ , n be a set of normal populations, w, having an unknown 
mean m, and variance <r 2 , G(x, A) being the distribution function which char¬ 
acterizes t, , Samples of equal size are drawn from each population, X t being 
the sample means, and 5 s the estimate of a 2 obtained. The problem is to construct 
a suitable decision rule d = d(jX v j; S 2 ) to select one or more populations, the 
object being to minimize the expected value of the random distribution function 

G(x | s(d)) = £ ZM) ■ G(x, 0.) / £ Z,(d), 

where Zfd) = 1 if ir, is selected by d, and = 0 otherwise. It is shown that under 
the restriction of impartial decision, the rule di : = “Always select only the popu¬ 
lation corresponding to the greatest A7’ cannot be improved, no matter what x 
or the true parameter values may be. It follows (i) that d k is the uniformly best 
decision rule in the class of impartial decision rules for all weight functions of type 

W = max {m4 — ^£ z,m< j £ z,^ , 

and (ii) that the customary F and i tests of analysis of variance are not relevant 
to the problem. 

This result is an application of Theorem 1 which applies to a number of similar 
problems concerning k populations, especially when the populations admit 
sufficient statistics for their parameters, Two examples of statistical applications 
are given in Section 6. 

2. Introduction. It has been recognized for some time that the classical 
theory of statistical inference does not provide direct answers to many problems 
which are of great interest in the applications. One of them, which arises in 
the theory df samples from k populations, is what Mosteller has called “the 
problem of the greatest one.” The word “population” is used here for a process, 
ir(0) say, which generates independent random vanables Xi, Xz, , each X 
having the same distribution function P(X < x) = G(x, 6) say, and a set of X’b 

1 This paper is based on a thesis submitted to the Department of Mathematical Stat¬ 
istics, University of North Carolina, in partial fulfilment of the requirements for the 
Ph D. degree This work was sponsored by the Office of Naval Research, 

362 



PROBLEM OP k POPULATIONS 


363 


which have been generated by x is called a sample from the population We shall 
describe the problem, as also the formulation adopted in the following section, 
m terms of two special cases. These cases occur when the k given populations 
xi, x 2 , • ■ ■ , x* are such that x, is characterized by the distribution function 


G(x, 6 ,) 



(h*, c,), Ci 0, i 


1, 2, • ■ • , k, where h(x) is an 


absolutely continuous non-decreasing function with h(— oo) = 0, h (-(-°°) = 1. 
Such sets of populations appear frequently in statistical theory and practice, a 
given set of normal, or rectangular, or gamma type populations being familiar 
instances. 

Case 1. Let X tJ , j = 1, 2, • • • , n be a sample from the population tt, , i = 
1, 2, • • • , k where x* is characterized by the distribution function h 



b, being unknown, and suppose that the statistician is asked to select the popu¬ 
lation which he thinks has the greatest 6,, but is allowed to select more than 
one population if (as a consequence, say, of “insignificant” outcomes of tests of 
differences between populations) he does not feel confident enough to select only 
one This situation will occur if, for example, the X„’s are observed yields in an 
agricultural experiment in which each of k varieties has been replaced n times, 
the yield with variety x, being normally distributed with unknown mean m, and 
variance a, and the statistician is asked to recommend one or more varieties 
for general use. (Cf. Example 1 in Section 6.) 

Case 2. Suppose now that the X„’a are samples from populations x, char¬ 


acterized by distribution functions h 


Qb 


> 0 unknown, i = 1, 2, 


K 


and the statistician is asked to select the population which he thinks has the 
greatest 1/c,, but is allowed to select more than one population.' 1 This situation 
will occur if, for instance, the x, are factories producing an article having a numeri¬ 


cal quality characteristic X, h 



being the distribution function of X in the 


product of x, , and the statistician is required to assign production to one or 
more factories, the object being to obtain product of stable quality, b being the 
standard characteristic 

It is clear that the usual statistical theory, which confines itself to estimation 
of parameters 6 < and testing of hypotheses of the kind fT>(&, = constant), is 
inadequate to deal with problems of this sort, where a definite course of action is 
required of the statistician. It is hardly necessary to add that selection is an im¬ 
portant problem in the applications, and the testing of hypotheses is often an 
indirect attempt to justify selection. In accordance with Wald’s formulation of 


2 There is no essential difference between the problem of the greatest one and the problem 
of the least one. In order to avoid trivial complications, the terminology of the former will 
be used wherever possible. 



364 


RAGHU RAJ BAHADUR 


the problem of statistical inference, 3 we proceed to consider explicitly the purpose 
of selection and the “loss” involved m making any particular selection. 

3. A class of weight functions. Let ti wj be a given set of popula¬ 

tions, 7Ti being characterized by the distribution function G(x, 0 t ), and let us 
denote any particular selection, say s, by indicator variables & , z 2 , ■ ■ ■ , z k where 
Zi = 1 if t, is selected and = 0 otherwise Since any meaningful selection must 
concern itself with the random variables generated by the populations selected, 
consider the function G(x \ s ) = z,G(x, z l G(x \ s) is a distribu¬ 

tion function, and provides a logical and direct overall picture of the effect of 
making the selection s, since no distinction is made between the populations 
selected. In immediate generalization, we define a “selection” s to be a vector, 
s = (Pi, P 2 , • ■ , Pk) with p t > 0, P. = 1, and put G(x | s) = 2»-i 
p % G{x, 6 t ). Roughly speaking, G(x | s) is the distribution function which charac¬ 
terizes the mixed population obtained if sampling rates pi , p 2 , ■ • , Pk are 

assigned to in , irj , • • • , n respectively, p, = 0 corresponding to rejection of 
ir> . Henceforth, a selection vector will be called a decision. 

Now, if each of the G(x, 6 l )’s were known, an appropriate decision s could be 
chosen without resort to sampling. If not, the statistician must construct (in 
advance) and use an s-valued function of the sample values. Such a function, 
say d, is called a statistical decision function or decision rule The decision s 
according to d, say s(d) = (:pi(d), Pi(d), • ■ , Pk{d)), is in general a random 

vector, so that for any fixed x, G(x | s(d)) is a random variable. Consider the 
distribution function H(x \ d) = E[G(x j s{d))] = JZt-u G(x, d l )E[p l (d)], where 
E denotes the expectation operator. It represents the average overall effect of 
using the decision rule d, and so affords a reasonable description of the perform¬ 
ance of d Clearly, the problem is to construct d in such a way that H{x \ d) has 
desirable properties 

The “desirable properties” will depend, of course, on the particular problem 
being considered Returning to our two cases, denote the arbitrary but given 
set of all possible parameter points = (0 X , d 2 , ■ • ■ , 6k) by I!, and let D 
be a given class of decision rules d = d((X„}). Then, m Case 1 we wish to 
choose d* e D such that H ( x | d*) = inf H(x\ d) for every x and every oi e Q. In 

ii/D 

Case 2, we wish to choose d* so that for every x and every u we have 
H(x | d*) = inf H(x \ d) whenever x < b, and = sup H(x\d) whenever x > b 

dtD den 

These requirements are very strong, and in general no such d* will exist without 
heavy restrictions on 0 and on D. (Cf. however the corollary to Theorem 1. It 
will be found that in a number of cases no restrictions on 0 are required provided 
that D is the class defined there.) For some purposes, it may be sufficient to 
consider functionals of H(x | d). The functionals which are most useful in the 
applications are the moments. Thus, one may wish to find d* such that a(d*) = 

f +°° 

g(x)dH(x | d), g(x) being some appropriate function. 

00 


* See, for example, [3], Chapter VI. 



PROBLEM OP k POPULATIONS 


365 


For example, in Case 1 we may take g(x) = x. Then a(d) is the mean of a random 
variable having H (.x | d ) for its distribution function, and constructing a suitable 
d to maximize a(d) is “the problem of the greatest mean.” Again, in Case 2 we 
may take g(x) = — {x— bf, and m that case maximizing a{d) would be “the 
problem of the smallest variance ” 4 

In terms of mixtures of distributions, H(x\ d) is the mixture of G(x \ s) with 
respect to 8 , where 8 is the probability measure induced by the decision rule d 
on the class of Borel sets in the space of all possible decisions s. It follows by 
the use of Theorem 5 in [4], or otherwise directly, that maximizing aid) is equiva- 

lent to maximizing the expected value ( 8 ) of X- V* / g(x)dG($, 6 t ). Writing 

< 1=1 W —00 

/'+ aD 

g,= g(x)dG(x, 0 ,), one may say that the object is to construct d in such 
J— oo 

a way that the expected value ( 8 ) of the “weight function” 

h 

W(u, s) = max { 0 ,-} - Sp.j, 

t 1—1 

is minimized for every <o. W represents the “loss” incurred by choosing the de¬ 
cision s when the true parameter point is <o It will be seen that W defined accord¬ 
ing to (A) in Section 5 includes essentially all weight functions which are likely 
to be of interest in the type of problem considered in this paper. 

We have so far not emphasized the obvious fact that the probability measure 8 
which is induced by d on the space of decisions will m general depend on the 
unknown parameter point w. Therefore, the expected value ( 8 ) of W is to be 
written as E[W{o>, s(d )) | w] = r(d | w) say. Following the usual terminology, we 
shall call r(d | gj) the risk function of the rule d, and shall say that d* e D is the 
uniformly best rule in the class D if r(d* | w) = inf rid | o>) for all w e 0 


4. A class of decision rules. The class of decision rules to which we shall 
confine ourself is rather limited, and may be described as follows, with reference 
to the previous sections: 

(i) Given independent random variables [X^),] — 1,2, ■■ ,n',i — , , , 

from the k populations 7 r,, let 

X, = *<Xa, X* , • • • , XJ, i = 1, 2, • ■ , fc and 7 = 

where X t , X 2 , ■ ■ • , X k ; 7 is an independent set, and the X.’s have fre¬ 
quency functions The choice of« and ^ will depend upon particular cases: in 
Case 1, Xi , , Xt ; Y will be statistics relevant to the estimation ol 


< An unpublished theorem of Herbert Robbins insures that if a d* satisfies. the 
requirements of the preceding paragraph, it will also maximize all functionals «(d) cor¬ 
responding to such functions g(x) 



366 


RAGHU RAJ BAHADUR 


bi , b 2 , ■ ■ ■ , bn ; c respectively, and in Case 2 they will be relevant to 

7 6 

Cl j C2 j * * Cfe ) 0, 

(ii) Given the statistics (X,); Y, 'P) is the class of all impartial decision 
rules which are based on them. A decision rule d = d( jX.j; Y) is said to be 
impartial if it has the following structure. Let Xm < X© < • ■ • < X lk) 
be the ordered X,’s Then d defines non-negative random variables Xj(X© , 
X(i) , • , X(k) ; Y), j = 1 , 2, ■ , 7c such that Y )-i X, = 1 , and is the 

proportion p(d) which is assigned by d to the tt corresponding to A© . We 
use the term “impartial” for such decision rules because they determine the 
proportions [Ai, Xj, • • • , X t ] without regard to which X belongs to which 
population, and then assign these proportions in strict order of the X,’s 
We shall specify the intuitively plausible class of impartial decision rules for the 
important normal cases, and give a few instances of such rules. 

Suppose first that the X„’s are from normal populations having means m, 
and a common variance v z , and that we are interested in the problem of the 
greatest mean. D is then the class of all impartial decision rules which are based 
on the statistics 

X, = X, = £ X. ; /n, i-1,2, ... ,k, 

7"1 

Y= S 2 = YY (X v - Xif/lc (n ~ 1). 

3-1 3-1 

The numerical factors are of no importance, and may be omitted (Cf. footnote 4. 
See also Example 2 in Section 6, where such factors have been omitted for con¬ 
venience). A rather simple member of D is the rule 1/3, X t = 2/3] i.e. 

“Always assign the proportion 2/3 to the population which has the greatest 
X,, and the proportion 1/3 to the population with the second greatest.” In using 
this rule although the X/s remain constant from sample to sample, the decision 
s(d ) is a random vector In general, however, the X/s will themselves be random 
variables. This is the case if, for instance, one insists on utilising the standard 
test of differences between populations, and uses the impartial rule “Perform the 
F test of /A(m. = constant) at the five per cent level. If Ho is rejected, assign 
the proportion 1 to the population which has the greatest X, . If not, assign 
equal proportions to all populations for which X, > Si=i X,/fc, and zero propor¬ 
tions to the rest.” Another type of impartial decision rule according to which the 
X,’s are random variables will be described at the end of Example I in the next 
section. Now, it is (intuitively) clear that if the sample size n is indefinitely 
large, the rule [X* = 1], i.e., “Always assign the proportion 1 to the population 

1 It is unnecessary to specify here the exact relation between the statistics and the 
parameters. (a) the definition of the parameter which determines a distribution function 
G(i, 0) is more or leas arbitrary, e.g., instead of writing 0 = (b, c) we may write 8 = 
(6 3 /c, cosh c), and (b) Dfa, ^i) = D(<p 2 ; ipi), provided that <p 2 = = g&i), where 

f(x), g(x) are strictly mono tonic functions. It will be seen that Theorem I is invariant under 
such transformations of parameters and/or of statistics. 



PROBLEM OP k POPULATIONS 


367 


with the greatest X l ”, cannot be improved, no matter what the true parameter 
values may be. Our mam result (Theorem 1) asserts that the statement is in 
fact valid for any n, provided that one restricts oneself to the class of impartial 
decision rules 

In a similar way, if the X,j’s are from normal populations having a common 
mean m and variances a\ , I) would be the class of all impartial decision rules 
which are based on the statistics 

X, = Si = t, (X tJ - Xy/n - 1, i = 1, 2, ■ ■ • , k, 

XJkn, 

tasl 

and analogous remarks will apply to this case. 

It should be observed that in a given case the appropriate statistics {X*}; F 
may not be as obvious as m the case of populations like the normal which admit 
sufficient statistics for their parameters This real difficulty is not to be confused 
with the ambiguities mentioned in footnote 4. Furthermore, given the X t ’s 
there may not exist F = i/'({X 1J ]) which is independent of the X,’s: we shall then 
assume, without invalidating our result, that the parameter which F is supposed 
to estimate is known. Theorem 1 becomes operative only after such questions 
have been resolved. 

6. The uniformly best decision rule. It is convenient to define here some 
terms which will be used subsequently without further explanation. All functions 
are assumed to be Borel measurable. Sets will be denoted by curly brackets: thus 
{/ = c] is the set on which / = c holds, and {a,} is the set of all u, in question. 
“Measure” will refer to ordinary Lebesgue measure in the xy plane. 

Definition 1. Given k independent random variables X *, % = 1, 2, • ■ • , k, 
such that each X has a frequency function, let X(,j , j = 1, 2, • • • , k, be the 
ordered set, X i3) being the jth X t in ascending order of magnitude. Then A„ = 
{X, = X( 3 )}, and a„ is the characteristic function of the set A t] , that is, a v = 1 
for any point of A t] and = 0 elsewhere. 

Since the X,’s have a joint distribution which is absolutely continuous, the 
sets Aij are well defined with probability one. Clearly, we have a v = 1 

for every j andS;_i a,j = 1 for every t, with probability one. 

Definition 2. Let 13 = (bi , , ■ • • , b k ) be a vector of real numbers b { , 

and <t> — (J\ , ft , • • ,fk) a vector of real-valued functions/.(a;) defined for every 
real x. We shall say that <j> t T((3) if for any r, s - 1, 2, • ■ ■ , 7c for which b T S b, , 
the set {fr(x)f,(y) < My) f,(x), x < y] is of measure zero. 

We require the following 

Lemma. Suppose that Xi, X 2 , • • • , X* ; F are independent random variables, 
Xi having a frequency function /,( x) and that 4> = (fi, fz >■•■)/*)« T{0), where 
(3 = (&i , h , • • ■ , b k ) with 

(1) bi < b t < ■ ■ ■ < b k . 



368 


RAGHU RAJ BAHADUR 


Then, for any non-negative random variable X = \{X m , X (2 ) , ■ • , X (k) ; Y) and 
any p,q,m = 1,2, • • ■ , k with p < g, we have 

(2) t E{\a iv ) < it E(Xa tg ). 

i=m 

Proof. Since (2) holds trivially if p = q or if m = 1 suppose that p < q 


and m > 2. Writing B(m, j ) = £ a,', = If = A,, , (2) is equivalent to 

/ X dP > / X dP, and hence to 


(3) 


/. 


X dP 




*/. 


X dP, 


B' (m,g) 


where B' denotes the complement of P, and P the probability measure in (aq , 
a-i , • ■ , x k , y) space. 

For any permutation 44 • ■ - 4 of 123 • • Jc, define 0(ii4 • ■ 4) = 

A n \A n t • ■ • A lkk Clearly, the 0’s corresponding to different permutations are 
disjoint and each of the sets P(m, q)B'(m, p) and P(m, p)B'(m, q) is the set- 
theoretic sum of certain 0’s. Now, it is easy to see that 


(4) 


0 C B(m, q)B'(m, p) 
0* C P(m, p)B'(m, q) 


if and only if 
if and only if 


4=1, or 2, • • ■ , or m — 1, and 

i g = m, or m + 1, ■ • • , or k. 

t* = m, or m -f- 1, • • • , or h, and 

i* = 1 , or 2 , • • •, or m — 1 . 


Hence a one-one correspondence between subsets 0(4 - • • 4) of B{m, q)B'(m, p) 
and subsets 0* = 0(%* • • ■ i*) of B(m, p)B'(m, q) exists through interchange 
of the pth and gth elements of the defining permutations, the other elements 
remaining the same. It will be sufficient to prove that if 0 and 0* are any pair of 
corresponding subsets, the integral of X over 0 is greater than or equal to its 
integral over 0*, for then (3) will follow by addition. 

It is clear that for any 0, 



where B is the domain {4 < 4 < ■ • * <4} and F(y) is the distribution function 
of Y. Let 0 and 0* be any pair of corresponding subsets. It follows from (5) 
that 

/ X dP — / X dP = / Q [ n U (4)] II dt T dF{y ), 

*0 Jo* Js XfS.f, 5 r—1 



PROBLEM OF k POPULATIONS 


369 


where 

( 6 ) Q = Hk , U , ■ ■ ■ h ; y)[f, p (Qf u (t g ) - f H (t p )f lp {t q )]. 

From (4) and (1) we have b lp < b lq , Since p < q implies that f 3 < t Q over R, 
and 4> e T(j 3), it follows that the expression m square brackets in ( 6 ) is (except 
perhaps for a set of measure zero) non-negative over R Since X is also non-nega¬ 
tive, it follows that Q is non-negative over R, and the Lemma is proved. 

We shall now state and prove the main result. Note that the statistic Y is not 
necessarily real-valued. 

Theorem 1 . Suppose that 

(A), tlis a given set of points « = (0i, 0 2 , • • , Of). 0(co) = (bi ,bi, • ■ • ,b k ) 
and 7 ( 0 ) = (ffi , p 2 , • • ■ , pk) are defined for every to such that b p < b g 
implies g v < g„for every p, q = 1 , 2 , • • - , 7c. 

k 

Given an s = (pi , p 2 , • ■ • , pf) with p t > 0 and P*‘ = 1, 

i=l 

k 

W(u, s) = max {g,} - £ Mo 

I t=l 


(B) . Zi , X 2 , , Xk ; Y are independent random variables, each X, having 

a frequency function f(x, 0 ,) = /.(a:) say, and </>(co) = (fi, fi , ■ • • , ff)- 

(C) . D is the clas3of all decision rules d such that 

k 

d = d{X( i) , X( 2 ) , • • , X(k) ',Y)= [Xi, , • • • , X*], X,>0,ZA- 1> and s{d) 

1-1 

k 

= (?>i(d), p 2 (d), • ■ , Pk{d)) where p,(d ) = 2 X, a„ , i = 1, 2, • • • ,k. 

j=i 

Given d e D, r(d | to) = JS[W(cu, s(d)) | to], 

(D) . For every to, <j> e TQ 3) G 

Then, for every to, r(dy | to) = sup r{d j to) and r(d k | to) = inf r[d | to), where 

d(D dtD 

di - [ 1 , 0 , 0 , • ■ ■ , 0 ] and d* = [ 0 , 0 , • • • , 0 , 1 ], 

Corollary. Suppose that ir,, 1 = 1,2 k are populations characterized 

, c t > 0. For any fixed x, let 
G(x | co, s) =2^ piG(x, 0,), and H(x \ d, to) = E[G(x | to, s(d)) | to]. 

t-i 

Case 1. If for every to, ( 1 ) ci = C 2 = • • • = c* , 

(ii) <t> e T(/3), where P = (bi, fa, ■ ■ • , b k ), 

then, for every to, 

H(x | d k , to) = inf H(x | d, to). 

deZ> 

Case 2. If for every u, ( 1 ) 61 = fa = • • • = ht = 6 (to), soy, 

(ii) 4> e T(/3), where /3 = (ci , c 2 , ■ • , c t ), 

6 Note that ^ e T(—R) is equivalent to 4>*€T(j3), where = (f* , ft > ' ' • /£), and/C is 
the frequency function of Xt = — Y.. 


by distribution functions G(x 


, 0 .) - h ( 



370 


RAGHU RAJ BAHADUR 


then, for every w, 


H (x | di, u) 


inf H(x | d, w) if x < 6(co), 

dtD 

sup TI(x 1 d, u) if x > b(u). 

( dtD 


Proof. Choose and fix an arbitrary weft. Without loss of generality we may 
assume the notation to be so chosen (by simultaneous intei changes of indices i 
m each of ((h), (b,), (g v ), {p,), (Z,), {/,}, and {), y = 1,2, , k) that (1) 

holds. It then follows that gi< Qi < • • - < g k and we write 


(7) <7> = Qi + lu + h 2 + • • • -j- h-t , h, > 0, i = 1, 2, ■ • • ,k. 

Choose and fix an arbitrary member of the class of impartial decision rules, 
say d = [Xi , X 2 , • • • , XJ. We have 

k 

(8) r(d | w) = max [g,] - X g.EfKja,,). 

< t,j —i 

Now 


(9) 

Since \j 

( 10 ) 


0*-E(Xya w ) = 2 (&i + + * • • + ht)E(}ija l3 ) 


»,j-i 


i 


* r A 

= £ 7 l X; E(\jCL- l y') h m , 

w»,j™i Li” m _ 


X,(Za) , Z(j) , • • , X(t) ; Y) > 0, it follows from the Lemma that 

k k 

X E(X,a,j) < X F J (Xy a,*,) for every m and every j, 

*"“70 t=m 


by writing X = X, , p = j, and q = k in (2) By using (7), (9) and (10) it follows 
that 

X < gi + X [X 

1,3=1 m,j=l L *“771 _J 

( 11 ) = 01 + X X KE(aJ 

wi— 1 t=tn 

k 

= X g t E(a,k)- 

i=l 

Therefore, by (8) and (11), 

(12) r(d ] of) > max {5,) — X g,D(a, k ) = r(d k | w), 

1 1=1 

by definition of di. The inequality r(d j w) < r(di | w) follows from (8) and (9) 
by a similar use of the Lemma. Since both dtD and w t ft are arbitrary, this 
completes the proof of Theorem 1. 



PROBLEM OP k POPULATIONS 


371 


The verification of the corollary is as follows Choose and fix an arbitrary x 
and write h ~ k(o>). 

Case 1. Let t(w) = (1 — k , 1 — k , • • , 1 — 4) Then r(d | «) = 

H(x | d, to) — mini {£ t }, and it follows from the Theorem that H(x\di,a) = 
supj.D H(x | d, w) and H(x\dk ,u) = inf JtD H(x | d, u), for all u. 

f(4 , 4 , ,4) if b( to) > x, 

[(1 — 4,1 — 4 | • • • , 1 — tk) otherwise. 

fmax{4} — K(x | d, co) if b(u) > x, 


Case 2. Let 


Then we have 


so that 


Y(o>) = 

r{d | w) = 


H(x | d, to) — min [4 

i 

(inf H(x | d, w) 

|d«D 

H(x | di, «) - { | di 

Idea 


otherwise, 
if b(u) > x, 
otherwise, 


and conversely for Ii(x \ d k , u), for all to. 

The preceding proofs suggest that perhaps (D) is not a necessary condition, 
but the following theorem for the case of two populations shows that it is indis¬ 
pensable if Theorem 1 is to hold in general. 

Theorem 2. Suppose that (A), (J3), and (C) hold with k = 2 and 0i, B 2 real¬ 
valued, that the set ft of points to = (0i , 0 2 ) is denumerable, that p(u) = u, that 
Oi Qifor any to, and that Y is a fixed constant. Let /t(co) = min, {0,}, r(co) = 
max, {0,j, and defining the sets 

R(o>) = {f(k , M) f(t 2 , v) < f(t i , v) fiU ,y), 4 < 4}, 

S{us) = {/(4 , p)f{k , v) > fik , v) f(k , ft), k<h\ 


in the k, k-plane, put 

R*ik, 4) = Z R(w), 

a) 

s*(ti , 4) = Z s(»). 

Then a uniformly best decision rule in the class D exists if and only if the set R S 
is of measure zero. Subject to existence, the uniformly best rule, say d* t may be 
defined as 

j* _ /t 1 * 0 ] l f 7 ^®) e ?*> 

\[0, 1] otherwise. 

The proof is quite simple, and will not be given. It is clear that under the 
hypotheses of this theorem, the conclusion of Theorem 1 is valid if and only if 
the set R* is of measure zero, that is, if and only if condition (D) holds. 



372 


RAGIIU RAJ BAHADUR 


6. Examples and discussion. We begin with two applications of Theorem 1. 

Example 1 Suppose that grain is to be raised on a given area, say A, of land. 
k varieties, 7 t 3 , ir 2 , ■ • • , n say, are available, the yields per unit area being 
normally distributed with unknown means m* and a common variance <j\ also 
unknown. A preliminary field experiment (in which n plots of unit area were 
assigned to each variety) has been carried out, and {-Xij}, j = 1, 2, • , n; 

i = 1, 2, • • ■ , k is the set of independent plot-yields obtained. The statistician 
is asked to suggest how the available land should be divided between the fc 
varieties, the object being to make the total expected yield as large as possible. 7 

Suppose that an area Ap, is assigned to v,, i — 1,2, • , fc, with EZ*«i p t = 1. 

Then the expected total yield is Apmii , Our object is to choose the set 

(Pi i Vi > ' •' > Pk) = s so as to minimize the “loss” 

k 

W(u, s) = max { Am ,} — 2 . 

i i-i 

Since the m,’s are unknown, one must construct an appropriate s-valued func¬ 
tion of the X tJ ’s, say d, and set s(d ) = d({Z,j}). The expected “loss” in using 
this procedure is given by E[(u, s(d)) \ u) = r(d | w), and the problem is to con¬ 
struct a d which makes r(d | w) as small as possible. (See (A) and (C). Here 
Ave have set0, = (wt* ,a).u = (0 3 ,0 2 , ■ ,0 k ), /3(w) = (»i 2 ,m a , • - • ,m k ) and 

7 («) = (Ami, Am-x, ■ ■ • , Am k )). 

Let Xi = E XJn, % = 1, 2, ■ • • , 1c and S 2 = EE (X< } - X,f/k(n - 1). 

3-1 |-1 J-l 

Since !i , Z 2 , ■ ■ ,X k , g is a set of sufficient statistics, it is easy to see 
by taking conditional expectations that corresponding to any decision rule based 
on the X tJ ’s, there exists one defined in terms of the XV s and S 2 alone such that 
the risk functions r of the two are identically equal for all possible values of 
the unknown parameters. Clearly, one may confine oneself to decision rules of 
the type d = s(( Z,}; /S 2 ). Now, the frequency function of X-, is f t (x) = 
(n/lircy 2 . exp[— n(x — m t ) 2 /2<r 2 ], and it is readily seen that m T < m, and 
x < y imply f r (x)J,(y) > f,(x)f r (y) It follows that in the class of all impartial 
procedures which are based on (Z.), S 2 , the uniformly best procedure is to assign 
the whole area A to the variety with the greatest observed yield. (Note that 
by the corollary to Theorem 1, a much stronger result than the one required 
here holds. Cf. footnote 3 ) 

Although Paulson did not set up a weight function in his discussion of the 
selection problem for the present case of samples of equal size from k normal 
populations having unknown means and a common variance, also unknown, he 

7 A double expectation is involved the expected eonaequence of a given decision, and 
the expected decision in using a.particular decision rule. The argument given is justified 
since it is assumed that the random variables generated by the it’s subsequent to decision 
are independent of the random variables on which decision is based. Cf. Section 3. This 
remark applies to Example 2 also 



PROBLEM OF k POPULATIONS 


373 


gave a class jd c } of decision rules and evaluated some probabilities (P(G'i) and 
P*■ [1], PP- 96-97) which suggest that some of the applications he had in mind 
are similar to the one given here. In our notation, the rule d c is defined as follows 
for any given c > 0 . 


dc = [Xi, Xa, • ■ • , X fc ], where 



3 = 1, 2, 


,k 


with 




1 if X(k) — c(S/s/n) < X(j) < Xik) , 
0 otherwise. 


Example 2. Suppose that a manufactured article has a numerical characteristic 
x, and a given article is “defective” if it has ana: < a and “acceptable” otherwise, 
where a is some constant. A consumer requires a large number (JV) of articles, 
which can be supplied by each one of k manufacturers ir,, i = 1, 2, • ■ , k. The 
characteristic (say length) of articles produced by ir, is known to have a rectangu¬ 
lar distribution with range from b to b + c,, but the c,’s are not known. As a 
preliminary step, the consumer has obtained samples of v articles from each 
manufacturer, and finds the corresponding lengths to be X„ , j = 1, 2, • • ■, v, 
i=l,2, - • , 7c. The statistician is asked to suggest how the consumer should 
order a total of N articles from the 1c manufacturers. 

If a < b, the number of defective articles received by the consumer will be 
zero no matter how the order is placed Suppose therefore that a > b. Then, if 
n x articles are ordered from ir, with = N, the expected number of 

defectives equals N — ]C*Li (nJN)-g, , where g, = g(c x ) and g(t) is given by 


git) = 


[iV (l - if t>a-b, 


otherwise. 


Writing /3(w) = (ci , c 2 , • • • , c*), y(«) = (ffi , St , ' ‘ • , 9k), it is clear that the 
expected number of defectives is of the form W(u, s) + h(u>), where h (w) is 
independent of s = in,/N, n 2 /N, ■ • • , ni/N), and W is defined as in (A) 

We have now to consider what statistics X , should be used to construct decision 
rules. Evidently, we are concerned with a “problem of the greatest c, ” 

(a) . Assuming y > 1, let X, = max, [X„j - miiq {X,,}. Since the fre¬ 
quency function of X , is /,• ( x ) = v(v — l)c, 11 (c, — x)% 2 if 0 < x < c, and zero 
elsewhere, it is a simple matter to show that c, < c„ x < y imply f r (x)f e (y) > 
f,(x)f r (y). It follows that m the class of all impartial rules which are based on 
the sample ranges, the uniformly best rule is to order all the N articles from the 
manufacturer with the greatest sample range. 

(b) . It may be objected that since the lower end points of all the distributions 
are the same, the use of sample ranges to construct decision rules is not particu¬ 
larly appropriate. Suppose therefore that one takes the statistics X , = 
max, (X„) — b. The frequency function of X* is /, 11 (x) = vcXx 1 forO < x < c, 
and = 0 elsewhere, and as before, condition (D) holds. Hence the uniformly 



374 


RAGIIU RAJ BAHADUR 


best impartial procedure in this class is to order all the N articles from the 
manufacturer who supplied the article with the greatest length in the whole 
sample of kv articles 

It is important to observe that the uniformly best procedures according to 
(a) and (b) are not identical, and choosing between them is outside the scope of 
Theorem 1. Note also that the statistics X* are sufficient for the c,’s. Therefore, 
corresponding to any decision rule there exists a decision rule which is defined in 
terms of the X*’s and has the same risk function. In particular, there exists a 
decision rule in class (b) which is equivalent to the uniformly best impartial rule 
in class (a). It would be interesting to know whether this equivalent rule is also 
an impartial one 

The two examples given above are purely illustrative, and the reader will 
readily construct others in which the statistician is faced with similar problems of 
decision. The second example does not, strictly speaking, belong to Case 2, and 
the reader is urged to consider some specific instances of this Case. There are 
various modifications of “the problem of the greatest one" which may be indi¬ 
cated here very briefly. These modifications are introduced by placing restrictions 
on the class of possible decisions. For example, in Example 1 the statistician may 
be required to select two or more varieties, and to assign proportions of the land 
to the varieties which he selects in such a way that no variety takes more than 
two-thirds of the available land. In that case, the uniformly best procedure (in 
the class of all impartial procedures which are based on the Xi's and <S' a ) would 
be to assign two-thirds of the land to the variety with the greatest observed mean 
yield, and the remainder to the variety with the next greatest. The proof is a 
slight elaboration of the proof of Theorem 1 and is left to the reader. Again, in 
Example 2 the consumer may wish to obtain all the articles which he requires from 
some one manufacturer. In that case, assuming that an impartial selection rule 
based on the is to be used, it follows trivially from the case considered 
previously that the uniformly best procedure is to select the manufacturer with 
the greatest X* . This is intuitively obvious, but the obvious requires proof (i.e. 
verification of (D)), as may be seen by turning to Example 3. 

The intuitive notion referred to above is one which is employed quite fre¬ 
quently in practice. It may be described as follows. Let Xi and X 2 be independent 
and similar estimates of unknown parameters m i and ra 2 , and suppose that in a 
given instance we have Xi > X 2 . “Then it is more reasonable to suppose that 
mi > m* than to suppose that rri\ < m 2 Theorem 2 shows that this notion 
is well-founded if and only if condition (D) is satisfied, with f) = (mi , ra 2 ). The 
condition states essentially that “the likelihood of the greater estimate corre¬ 
sponding to the greater parameter is always > the likelihood of the contrary 
event,” and it should be observed that X\ , X 2 being “good” estimates (e.g. 
maximum likelihood estimates) does not ensure that this will be the case. The 
following application of Theorem 2 is an illustration of these remarks. 

Example 3. Suppose that t, , i = 1, 2 are Cauchy-type populations having 
medians to, , and that the set of possible points w = (mi, rn 2 ) consists of just 



PROBLEM OF k POPULATIONS 


375 


the two points = ( 1 , — 1 ) and = (— 1 , 1 ). -Xi and X 2 are single observations 
from the two populations, and the statistician is required to decide which 
population has the greater median. 

Here it would be reasonable for the statistician to use a decision rule, say d*, 
which minimizes r(d \ co) = P(incorrect decision | to, d), -where ‘V, has the 
greater median” and “vt has the greater median” are the two possible decisions. 
That this risk function is included in the scheme described by (A) and (C) may 
be seen as follows. Let the only admissible values of s be (1, 0) and (0, 1 ), cor¬ 
responding to the decisions “m i > m, 2 ” and “mi < )n 2 ” respectively, and setting 
/3(to) = (mi , m 2 ), define y (uj) = (1, 0), 7 (n> 2 ) = (0, 1). Then for any d such that 
s(d) equals (1, 0) or (0, 1) only, the expected value of W is for either w the 
probability of error in using the rule d 

Now, if d = d(X(D , X (2 )) = [Xi , \ 2 ] is any impartial decision rule, it will equal 
either [ 1 , 0 ] or [ 0 , 1 ], corresponding to the decisions “the population with the 
greater X has the smaller median” and "the population with the greater X has 
the greater median” respectively. Since the frequency function of X , is f,(x ) = 
l/ir[l -(- (x — m,)] 2 , a little calculation shows that in the class of impartial 
decision rules a uniformly best one exists, and is given by 


d* = 


[ 1 , 0 ] 
[ 0 , 1 ] 


if X(i)X(2> > 2, 
otherwise. 


In conclusion, we remind the reader that although the weight function W 
defined according to (A) is general enough to include all problems of the type 
considered in this paper, the sampling scheme as also the class of decision rules 
to which our results apply is very limited. We have (i) assumed that the samples 
from the k populations are all of the same size, and (ii) given no objective criterion 
for choosing appropriate statistics, and no justification for the use of impartial 
decision rules based on these “appropriate statistics.” In view of the applications, 
it would be of interest to extend the general argument of this paper to the 
numerous situations where Theorem 1 does not apply or is otherwise unsuitable. 

The problem of selection was suggested to the author by Professor Hotelling. 
The author would like to acknowledge his indebtedness also to Professor Robbins. 
This paper could not have been written without his constant encouragement and 
helpful suggestions. 


REFERENCES 

[1] Edward Paulson, “A multiple decision procedure for certain problems in analysis of 

variance,” Annals Math. Slat., Vol. 20 (1949), pp 95-98. 

[2] Frederick Mosteller, “A fc-sample slippage test for an extreme population, 1 ’ Annals 

Math Slal , Vol 19 (1948), pp 58-65 

[3] Abraham Wald, On the Principles of Statistical Inference, Notre Dame Mathematical 

Lectures, No. 1, (1942), Notre Dame, Indiana. 

[4] Herbert Robbins, “Mixture of distributions,” Annals Math Slat., Vol. 19 (1948), 

pp, 360-369 



COMPLETENESS IN THE SEQUENTIAL CASE 

By E. L. Lehmann and Charles Stein 
University of California,, Berkeley 

1. Summary. Recently, in a series of papers, Girshick, Mosteller, Savage and 
Wolfowitz have considered the uniqueness of unbiased estimates depending only 
on an appropriate sufficient statistic for sequential sampling schemes of binomial 
variables. A complete solution was obtained under the restriction to bounded 
estimates. This work, which has immediate consequences with respect to the 
existence of unbiased estimates with uniformly minimum variance, is extended 
here in two directions. A general necessary condition for uniqueness is found, 
and this is applied to obtain a complete solution of the uniqueness problem when 
the random variables have a Poisson or rectangular distribution. Necessary 
and sufficient conditions are also found in the binomial case without the restric¬ 
tion to bounded estimates. This permits the statement of a somewhat stronger 
optimum property for the estimates, and is applicable to the estimation of 
unbounded functions of the unknown probability 


2. Introduction. The notions of completeness and bounded completeness of 
a family of distributions were introduced in [I, 2] in connection with the prob¬ 
lems of similar regions and unbiased estimation. The question of whether either 
of these two properties pertains to various families of distributions that are of 
interest in statistics was discussed in [ 2 ] under the assumption of fixed sample 
size. The only sequential problems of this kind that have been treated in the 
literature (with quite different terminology) refer to the binomial case. For 
this case Girshick, Mosteller and Savage [3] found necessary (and also certain 
sufficient) conditions on the sequential sampling scheme for completeness, while 
Wolfowitz [4] and Savage [5] gave necessary and sufficient conditions for bounded 
completeness. 

If T is a random variable distributed over an additive class of sets in some 
space according to a distribution Pj with 0 in some set w, then the family 
9 ,r = {Pf | 0 « w} of possible distributions of T is said to be complete if 


( 1 ) 

implies 

( 2 ) 


I f(t ) dPKt) = 0, 


for all 0 t to, 


fit) = 0 , a,e. (? T , 


that is, for all i except possibly in a set N for which Pj ( N) = 0 for all 0 < «. 
The family 9 T is said to be boundedly complete if this implication holds under 
the assumption that / is bounded. 

The relation of these concepts to the problem of unbiased estimation is an 

376 



completeness in sequential case 


377 


immediate consequence of a theorem of Blackwell [ 6 ], Let X be a random vari¬ 
able with distribution P'e , 6 t <a, and let T be a sufficient statistic for 6 Denote 
by Pe the distribution of T, and suppose that 9 T is complete. Then every func¬ 
tion 0 ( 0 ) for which there exists an unbiased estimate, that is, a function 4> such 
that 

Ee (p(X) = g(6), for all 6 e to, 

possesses an unbiased estimate with uniformly minimum variance. One can say 
furthermore that if <f>(X) is any unbiased or bounded unbiased estimate of g(6), 
then the optimum estimate guaranteed by the above statements is the condi¬ 
tional espeetation of 4>(X) given T. 

The aim of the present paper is to obtain certain results concerning complete¬ 
ness in sequential sampling schemes. Some necessary conditions for complete¬ 
ness are given in section 3, and these are used to obtain necessary and sufficient 
conditions for completeness when the random variable being sampled has a 
Poisson or rectangular distribution In section 4 it is shown that certain neces¬ 
sary conditions given in [3] for the binomial case are also sufficient 

3. A necessary condition for completeness. The sequential sampling schemes 
with which we are concerned are of the following nature. There is given a sequence 
of real valued random variables X\ , Xi , • • • with a joint distribution depending 
on a real parameter 0, which ranges over a set u We shall assume that for 
each m the set of variables Xi , ■ • • , X m admits a real valued sufficient statistic 
T m = tm(Xi, ■ ■ , X m ) for 0, and that for each m the family 9 >Tm of distribu¬ 
tions of T m is complete. We next suppose that there is given a stopping rule, 
which is such that after m observations have been taken, the decision of whether 
or not to take an m-plst observation depends only on the value of 
i m (Xi, • , X m ) It follows (see [6]) that if the total number of observations is n 

(a random variable which may be infinite), then ( T„ , n) is a sufficient statistic 
for 0. We shall say that the sequential procedure is complete if the family of 
distributions of ( T„ , n ) is complete. Throughout, we shall assume that all 
sequential procedures ip question are closed, i.e. that for each Sew, n is finite 
with probability 1. 

Let Y be a random variable distributed over a Euclidean space according to 
a distribution Pe with 0 in w. We shall say that a point y lies in the positive 
sample space of Y if there exists 0 « « such that every open set containing y 
has positive probability for this 0 , and that y is an impossible point if it lies in 
the complement of the positive sample space. Consider now a sequential sampling 
scheme as described above. For any integers m < p we shall denote by the 
positive sample space of T P given the first m steps of the stopping rule, that is, 
given for i = 1 , • ■ • , m the set of values of 2 \ for which sampling is discon¬ 
tinued after the ith observation Since all the T’s are real valued, the sets TF”! 
are sets of real numbers satisfying the obvious condition W p ^ ■ 1 he 

union U S m {S m is the set of points of 1 for which no m+lsf observation is 



378 


E L. LEHMANN AND CHARLES STEIN 


taken) will be called the set of stopping or boundary points, the points belonging 
to some UC -1 — S m are the continuation points. 

. We need the following 

Lemma 1. A necessary condition for a sequential procedure of the type described 
above to be complete is that every procedure obtained from the given one by trunca¬ 
tion be complete} 

This is an immediate consequence of the following more general 

Lemma 2. Let X \, X %, ■ ■ be as before a sequence of random variables such 
that for each m the set X i, ■ ■ • , X m admits a real valued sufficient statistic 
T m — t m (X i, • ■ , X m ). Let Si, 2 2 , - ■ , S r each be a complete, closed, sequential 
procedure based on these sufficient statistics. Let Si u S 2 u • • • u 2 r denote the sequen¬ 
tial procedure according to which we continue taking observations until at least one 
of the stopping rules Si, ■ ■ , S r tells us to stop. Then the procedure S L u • • • u 2 r 
is complete 

This clearly implies Lemma 1. For if one takes for Si any closed, complete 
sequential procedure and for 2 2 a procedure of fixed sample size, then 2i u 2 2 
is the associated truncated procedure. 

Proof of Lemma 2 It is sufficient to prove the result for the case r = 2. 

Let Tii, n 2 , n denote the number of observations taken under 2 t , 2 2 ,2i u 2 2 
respectively. Then n = ni if fti ^ n 2 , n = n 2 if =£ n 2 ■ Let / be any function 
on 2i u 2j such that 


Then 


Eef(T n ,ri) = 0 for all decs. 


E s E[f{T n , n) | T„, , rii] = Oj 
Ee E[f(T„ , n) | , n 2 ] = OJ 


for all Otu. 


Since 2i and 2 2 are complete it follows that 


E[f(T n , n) | T ni = <i , ni = 71 ] = E\f(T n , n) \ T ni = k , n 2 = 7 *] = 0, a.e. 
Hence 


(3) ° = P ^ ni ~ n ' ^ Tni = k ’ Ul = 7l ^ <1 ’ 7l ' ) 

+ P(ni > n 2 1 T ni = h . ni = 71 )E[f(T„,, n 2 ) | T„, = k , ni = 71 , ni > m\, 

and the analogous condition holds with the subscripts 1 and 2 interchanged. 

We shall prove that f(T n , n) = 0, a.e., by induction over the possible values 
of n. Suppose, therefore, that for some integer m 

P t {n ^ to, f(T n , n) 5 * 0) = 0. 

(This is certainly true for m = 0.) It then follows that if we take 71 = m + 1 
in (3) the second term of the right hand side vanishes, so that 

0 = P(n = Ri | T nx = <1 , ni = m + l)/(<i, to + 1). 

1 The authors would like to thank Mr E, Fay for pointing out an error in the original 
proof of this Lemma. 



COMPLETENESS IN SEQUENTIAL CASE 


379 


Hence, 

P»{n = ni = m + 1, f(T ni , nf) ^ 0) 

= Pe(n = ni = m + 1, P(n = n x | 7 , „ 1 , m) = 0) = 0 
Analogously we see that 

Pe(n = ni = m + 1,/(T«,, n 2 ) ^ 0) = 0 

and, adding, that 

Po(n = m + 1, /(T», n) ?s 0) = 0. 

This completes the induction. 

We need further the notion of strong completeness. Consider a random 
variable W = (U, V), suppose that the distribution of W depends on 8, and that 
V is a sufficient statistic for 6 Let P\ be the conditional distribution of V given 
U = u —this is independent of 6 since U is a sufficient statistic for 6 —and let 
£P V * = (£P^j. We say that the pair 9 >w , 9 >v * is strongly complete if the conditions 

(i) E e /(F) exists for all 9, 

(ii) E(J(V ) | U — u) = 0 for almost all u, 
imply 

/(«) = 0, a.e. £P V . 

For brevity, we shall then usually say that is strongly complete. 

We can now state the following necessary condition for completeness. 
Theobem. If a closed sequential procedure of the type considered above is com¬ 
plete, then 

(i) S m is almost empty for every m for which WZ+i — WZ+i is almost empty, 

(ii) for each m for which S m is not almost empty, the family of conditional dis¬ 
tributions of T m given T m+ i = t {ast ranges over Wl !+i — WZ+i) is strongly complete. 

Proof. For any t e W — WZ+i the positive sample space of T m given T m+i = t 
is clearly contained in S m . Suppose first that (ii) is violated and consider the 
sequential procedure obtained from the given one by truncation after m + 1 ob¬ 
servations By the lemma it will be enough to show that the truncated procedure 
is not complete. For this purpose let us assume that regardless of the stopping 
rule all m + 1 variables Xi, • • • , X m +i are observed. We want to construct an 
estimate of zero based on the sufficient statistic for the truncated procedure. 
This estimate must be a function of Ti for Tie Si, of T 2 for T 2 1 S 2 , etc. That is, 
although we may imagine that the full sample of size m + 1 is taken, we must 
be careful not to use observations that are impossible when the stopping rule 
is followed. 

We shall now show that there exists an unbiased estimate of zero which is 
zero over Si, , Sm-i , equal to f(T m ) on S m and g(T m+ i) on WZ+i where f 
and g will be defined below. Since expectation equals expectation of conditional 
expectation, a statistic is an unbiased estimate of zero if its expectation exists 



380 


E. L. LEHMANN AND CHARLES STEIN 


and its conditional expectation given T„+i = t is zero for almost all t. In our 
case this condition is equivalent to 

(4) f f(u) dP m (u | T m+ 1 = t) + g(t) [ m _, clP m (u | T m+1 = t) = 0 

JSm •'»“ ~B m 

for almost all f e WZ+ 1 , 

(5) [ f(u) dP m (u | T m+ 1 - t) - 0 
J a m 

for almost all t 4 WZ+i, re. for almost all t e — W?„+ 1 , since 

t i WZ+i implies P(S m | T m+ i = f) = 0, 

together with the existence of Ee(j(T m ) | n — m) and Es{g{T m+ i) | n = m + 1) 
Since (n) does not hold there exists / not vanishing a.e such that 
Ee(J(T ,„) | n = m) exists and (5) is satisfied. If g is defined by (4), 
Ee(g(T m+ i) \ n = m + 1) exists, and this completes the proof of the necessity 
of (n). 

The necessity of (i) is now obvious For if (i) is violated, then (5) is satisfied 
vacuously, and we can take / to be an arbitrary positive valued function (for 
example) and (4) will then be satisfied. 

As immediate consequences of this theorem we shall obtain two conditions, 
which are easier to apply than condition (n). 

Corollary 1 A necessary condition for completeness is that for no m there 
exists a subset A of S m such that 

Pe(A) > 0 for some 6 

and 

P(A | T m+ 1 = t) = 0 for almost all t t WZ+i — IITl+i ■ 

Corollary 2. Suppose that the sequence of X’s is such that m the nonsequential 
case for all m, p with m < p the positive sample space of T m given T P = t is the 
intersection of the unconditional positive sample space of T m with the inierooil [0, t]. 
Then a necessary condition for a sequential procedure to be complete is that each 
Sm differ from a half-open interval (possibly empty ) [a m , b m ) with a m ^ b m , % = 0, 
«m+i = b m , by a set of probability 0. 

Proof. Let r be the first value of m for which this condition is not satisfied. 
Then there exists c > b r -i such that the sets S r n [c, ») and S r n [b T -i , c ) both 
have positive probability The result now follows from Corollary 1 if one puts 
A = S T n [c, oo). 

Next we consider some examples. 

Example 1. Let X it X 2 , ■ ■ be independently normally distributed with 
known variance and unknown mean d. In this case Tm = , and since 

the positive sample space of T m+ i is the infinite interval regardless of the values 
of Ti , • , T m it follows from condition (i) of the theorem that no sequential 
procedure is complete, with the trivial exception of the procedures with fixed 
sample size 



COMPLETENESS IN SEQUENTIAL CASE 


381 


Example 2. Let Xi, X 2 , • • ■ be independently uniformly distributed over 
the interval (0, 9), 0 < 8 < co. Then T m = max (Xi, • • , X m ) and Corollary 2 
gives a necessary condition for completeness If the procedure is truncated we 
can deduce sufficiency of this condition from (5) However, this proof does not 
apply to the general case The following proof of sufficiency is similar to some 
of the proofs in [3, 4, 5]. 

Suppose S\, St, • • • form a set of adjoining intervals (some of them possibly 
empty), S m = [a m , b m ), and suppose there is a non-zero unbiased estimate of 
zero, 4> = 4>(T„ , n). Let m be the smallest integer for which <t> is not zero almost 
everywhere on S m Then 

“ (9) 

E e (‘ f>) = Pe(n = m)E 9 (4> | n = m) + £ Pe(n = | n - j) = 0, 


and hence 

(8) “ 

(6) Pe(n = m)E 9 (4 j n = m) s - E P B {n = | n = j). 

;—m+l 


Now the right hand side of (6) is zero when 9 ^b m , since it is then impossible 
that T, e S j for any j > m Hence 

Ee[<t>(T n , to) | a m £ T m < b n ] = 0 for all 9 £ b m , 

and therefore 

/ 4>(x, m)x m ~ x dx = 0 for all 9 in [a m , b m ], 

" a m 

But this implies <jS>(x, m) = 0 almost everywhere in S m , which is a contradiction. 

Example 3 Let Xi , Xi , • • be independently distributed according to a 
Poisson distribution with mean 9. Then T m = E"=i X, and again we can apply 
Corollary 2. To prove sufficiency we proceed as in example 2. If the condition of 
Corollary 2 is satisfied we may write without ambiguity \p(T n ) for 4>(T n , ri). 

Let c be the smallest value of T„ for which i{T n ) ^ 0. Then if the probability 

( 8 ) 

of T n = j is k(j)0 3 e^ 9mi , the identity £?»($) = 0 implies 

<t>(c)h(c)9 0 p 2 E <^(i)7c(y)0 J • v (c)/c(c)0 c 6- 9m “ = E <pU)k(j)9’e- em ’ . 

,~c +1 9=' c + 1 

Dividing this equation by 9 C and letting 9 tend to zero we see that the right 
hand side tends to zero, which implies <f>(c) = 0 and hence a contradiction. 


4. The binomial case. As was mentioned in section 1, the problem of bounded 
completeness was solved for the binomial case in [3, 4, 5]. Since presumably one is 
unwilling to estimate the bounded parameter p by means of an unbounded 
estimate, further work here may seem unnecessary. However, the problem of 
completeness seems to be of interest for two reasons. If the procedure is bound- 



382 


E. L. LEIIMANN AND CHARLES STEIN 


edly complete without being complete then, even though one may be reluctant 
to use such an estimate, there may exist an unbounded unbiased estimate of p t 
which for some values of p has smaller variance than the minimum variance 
bounded estimate. (An example of this is given in [2]). Since this possibility is 
ruled out when the procedure is complete it is seen that completeness permits 
statement of a stronger optimum property. Apart from this one may be interested 
in estimating some unbounded function of p such as 1/p. In this case bounded 
completeness does not permit any statements concerning existence of optimum 
estimates 

In the present section we shall change our notation somewhat. We are con¬ 
cerned with a sequence of independent trials with constant probability p of 
success. On the basis of m trials the total number y of successes is a sufficient 
statistic for p. Instead of representing the sufficient statistic for the sequential 
procedure by (y, «), we shall use the representation (x, y) where x is the total 
number of failures, so that x + y = n. The couples (x, y) may be thought of as 
making up the points with integral-valued coordinates of the first quadrant 
of an x?/-plane, and as before may be classified as boundary points, continuation 
points, and impossible points. Adopting the terminology of [3], we shall call 
the value of x + y the index of the point (x, y), so that the points of index m 
lie on the line x + y — m. 

Girshick, Mosteller and Savage defined a sequential procedure to be simple 
if for each m the continuation points of index m form an interval They proved 
that a necessary and sufficient condition for a bounded procedure to be com¬ 
plete is that it be simple. (A procedure is said to be bounded if there exists N 
so that the number of observations is gW.) They also showed that in general 
simplicity is not sufficient for completeness. However, it was shown later [4, 5] 
that simplicity is sufficient for bounded completeness 

A sequential procedure is said to be closed if the probability of termination is 
unity for every p with 0 < p < 1. It was proved by Girshick, Mosteller and 
Savage that a necessary condition for completeness of a closed sequential pro¬ 
cedure is that no procedure obtained from the given one by removing a boundary 
point be closed. (Removing a boundary point here means converting it into a 
continuation point.) We shall prove below that this condition together with 
simplicity is also sufficient for completeness. An interesting question is whether 
these two conditions are sufficient for completeness for the general sequential 
schemes considered in section 2, when simplicity is replaced by the condition 
that every procedure obtained from the given one by truncation is complete, 
and when the second condition is modified by the appropriate null set qualifica¬ 
tions. It is easily seen that both of these conditions are necessary. 

The following definitions will be needed below. A boundary point (a, b ) is a 
lower (upper) boundary point if for some x < 0 (>Q) the point (a + x, b — x) 
is a continuation point. An impossible point (o, b) is a lower (upper) impossible 
point if for some x < 0 (>0) the point (a + x, b — x) is either a continuation 
point or a boundary point. 



COMPLETENESS IN SEQUENTIAL CASE 


383 


If the procedure is unbounded every boundary point is either a lower or an 
upper boundary point. If it is simple, no point can be both an upper and a lower 
boundary point. The same remarks apply to impossible points. 

Theorem A necessary and sufficient condition for completeness of a closed 
procedure in the binomial case is that 
( 1 ) the procedure is simple, 
and 

(ii) the removal of any boundary point destroys closure 
Proof Necessity was proved in [3] as was sufficiency for bounded procedures. 
Sufficiency for unbounded procedures will follow from the following two facts, 
which we shall prove below. 

I. Suppose ( 1 ) holds and there exist numbers a, M > 0 such that for all boundary 
points (x, y) of index m ^ M the ratio y/x S a. Let f(x, y) be a non-zero un¬ 
biased estimate of zero defined over the set B of boundary points, and let m 0 
be the smallest index for which there are points with/(x, y) ^ 0 Then f(x, y) = 0 
for all lower boundary points of index m 0 . 

II. If (i) holds and if for every positive number a there exist infinitely many 
boundary points (x, y) with y/x S a, then one may remove any lower boundary 
point without destroying closure 

Suppose now that a sequential procedure satisfies (i) and (ii). Then, since no 
lower boundary point can be removed without destroying closure, it follows 
from II. that there exist a and M such that y/x ^ a for all boundary points of 
index ^ M. Hence if f(x, y ) is an unbiased estimate of zero, and if m 0 is defined 
as in I., fix, y) = 0 for all lower boundary points of index m 0 . Because of sym¬ 
metry the statements concerning upper boundary points analogous to I. and II 
also hold. It then follows analogously that f(x, y) = 0 for all upper boundary 
points of index m 0 . But for a simple unbounded procedure every boundary 
point is either an upper or a lower boundary point, and hence we obtain a con¬ 
tradiction with the definition of mo ■ 

Before proving I. and II. we state the following corollary, which generalises 
an example given in [3]. 

Corollary. A sequential procedure that is not bounded and that has a Jim e 
non-zero number of lower boundary points is not complete. The analogous result 

holds for upper boundary points. _ 

Proof of Corollary. This follows easily from II., since if a procedure of 
this type is to be closed there must exist for each a > 0 infinitely many upper 

boundary points (x, y) with y/x £ a. , TT 

In the remainder of the paper we are concerned with the proofs of 1. and ii. 
Proof of I Assume I to be false, and let (xo, l/o) be the lowest oun ary 
point of index mo for which fix o , Vo) ^ 0 Then y > J/o for all other oun ary 
points (x, y) for which fix, y) j* 0, Hence if the probability of a point (x, y) 
is c(x, y)p v q and if h(x, y) = c(x, y)f(x, y), 

7c(xo, yo)p Vo q x ° = -sfc(x, y)vY, 



384 


E. L. LEHMANN AND CIIAHLES STEIN 


where the summation extends over all boundary points of index for which 
y > y 0 • Dividing both sides by p v ° we see that 

k(x 0 ,yd)q x ° = -p’Lk{x ! y)p v ~ n ~ l q x . 

If we can show that the expression multiplying — p on the right hand side 
remains bounded as p tends to zero, we have a contradiction. For letting p 
tend to zero, we would then see that the right hand side tends to zero and the 
left hand side to /c(x 0 , yd), and hence that /(xo , yd) = 0. 

To prove this, note that 

| 2fc(x, y)p v ~ H ' 1 q x | ££ | Hx, y) \ p y ~^~\ 

The right hand side is a power series in p. We shall show that this series con¬ 
verges for some p 0 > 0. This implies uniform convergence for | p | < p 0 , and 
therefore the series remains bounded at p = 0 By assumption there exist num¬ 
bers a and M' such that y/x ^ a for all boundary points with y > M'. From 
now on we shall consider all series as being summed over the set of boundary 
points for which y > M' and hence q Sr q vla . Since only a finite number of 
terms are omitted this does not affect any convergence properties. 

Let 0 < pi < 1. Then, since / is an unbiased estimate of zero, the series 

2fc(x, y)p\ql 

converges absolutely. Hence, so does 

S I k{x, y) | P r y o- 1 8 r k,,,+,) it S | k(x, y ) | (qb)^ 1 = S | k(x, y ) | pV v '~\ 

and consequently the last series is convergent. 

Peoop of II. Let R be any closed simple procedure satisfying the conditions 
of II., and let (xo, yd) be any lower boundary point of R We denote by R* the 
procedure obtained from R by taking (x 0 , yd) to be a continuation point and 
by n* the number of observations for R*. 

We first prove that any upper impossible point of R is also an impossible 
point of R*. The negation of this would imply that one can get from a lower 
boundary point to an upper impossible point going only through impossible 
points. This would require at least one step of either of the following kinds: 
Lower impossible point —> upper impossible point, 

Lower boundary point —> upper impossible point. 

One can easily convince oneself with the aid of a diagram that any procedure 
under which such steps are permitted cannot be simple. 

Let 0 < p, t < 1, and let a be such that 0 < a < p/q , If p is the true prob¬ 
ability of success, y/x tends in probability to p/q, and hence there exists N 
such that 

P(y/x ^ a | p) > ir 

whenever the index of (x, y) exceeds N. By assumption there exists Ni > N 
and a boundary point (xi, yd) of R* of index Ni such that i/i/xj g a. Then the 



COMPLETENESS IN SEQUENTIAL CASE 


385 


probability exceeds t that the random point (x, y) of index Ni will lie above 
(a:i, j/i). Since {x\, yi) is a boundary point, the probability is therefore greater 
than 7 T that the point [x, y) of index N is either an upper impossible point for 
R and hence impossible for R*, or a stopping or continuation point for R. We 
have therefore proved that the probability is >ir that either n* ^ Ni or the 
point (x, y) of index Ni is a continuation point of R. 

But given that one has reached a continuation point (a, b) of R, there exists 
Ni such that 

P(n* £ 1V 2 1 p, b))£ir. 

For 

P(n* > N 2 1 (a, b)) = P{n > Ni | (a, b)) -> 0 as W 2 ■ 

Since there are only a finite number of continuation points of index Ni , it is 
now clear that there exists Wo such that 

P(n* ^ Wo | p) ^ t + t - 1, 

which can be made arbitrary close to 1 by proper choice of ir. Therefore R* 
is closed. 


REFERENCES 

[1] E, L, Lehmann and H. Scheff^ , “On the problem of similar regions,” Proc Nat Acad. 

Sci.,N ol. 33 (1947), pp. 382-386, 

[2] E. L. Lehmann and H. ScHEFFii, “Completeness, similar regions and unbiased esti¬ 

mation,” unpublished 

[3] M. A Gihshick, Frederick Mobteller and L, J. Savage, "Unbiased estimates foT 

certain binomial sampling problems, with applications,” Annals of Math. Stat., 
Vol. 17 (1946), pp. 13-23 

[4] J, Wolfowitz, “On sequential binomial estimation,” Annals of Math. Stat , Vol. 17 

(1946), pp, 489-493. 

[5] L. J Savage, “A uniqueness theorem for unbiased sequential binomial estimation,” 

Annals of Math. Stat., Vol. 18 (1947), pp. 295-297. 

[6] D. Blackwell, “Conditional expectation and unbiased sequential estimation,” An¬ 

nals of Math. Stat , Vol, 18 (1947), pp 105-110. 



SOME ESTIMATES AND TESTS BASED ON THE r SMALLEST VALUES' 

IN A SAMPLE 

By John E. Walbh 1 

The Rand Corporation 

1. Summary. Let us consider a situation where only the r smallest values of 
a sample of size n are available. This paper investigates the case where n is 
large and r is of the form pn -f- 0(y/n). 

Properties of some well known non-parametric point estimates, confidence 
intervals and significance tests for the 100p% point of the population are in¬ 
vestigated. If the sample is from a normal population, these non-parametric 
estimates and tests have high efficiencies for small values of p (at least 95% 
if p ^ 1/10). 

The other results of the paper are restricted to the special case of a normal 
population. Asymptotically “best” estimates and tests for the population per¬ 
centage points are derived for the case in which the population standard devia¬ 
tion is known. For the case in which the population standard deviation is 
unknown, asymptotically most efficient estimates and tests can be obtained 
for the smaller population percentage points by suitable choice of p and O(vn). 

The results derived have application in the field of life testing. There the 
variable associated with an item is the time to failure and the r smallest sample 
values can be obtained without the necessity of obtaining the remaining values 
of the sample. By starting with a larger number of units but stopping the experi¬ 
ment when only a small percentage of the units have “died”, it is often possible 
(using the results of this paper) to obtain the same amount of “information” 
with a substantial saving in cost and time over that which would be required 
if a smaller number of units were used and the experiment conducted until all 
the units have “died”. Jacobson called attention to applications of this type 
in [1], 

2. Introduction and statement of results. In life testing, information con¬ 
cerning the smaller population percentage points may be of primary interest. 
The principal aim of this paper is to investigate the properties of some well 
known non-parametric estimates and tests of the smaller population percentage 
points which are based on statistics of the type used for the sign test. These 
non-parametric results are easy to apply and have several other desirable prop¬ 
erties (see Theorem 1 and its discussion). In particular, if the 100p% point 
is to be investigated, it is only necessary to fail approximately 100p% of the 
number of starting items to obtain the required statistics (n large). Thus, if 
the non-parametric results should also happen to be reasonably efficient, they 

1 The author would like to express his appreciation to Max Halperin for calling atten¬ 
tion to this problem and for valuable advice and assistance in the preparation of the paper. 

386 



SOME ESTIMATES AND TESTS 


387 


would appear to be ideal for a life testing situation where a smaller population 
percentage point is to be investigated, 

Examination shows that life tests of the “wear out’’ type sometimes yield 
empirical distributions which are approximately normal. Also in many cases an 
approximately normal distribution can be obtained by an appropriate monotonic 
change of variable. Thus the case in which the n observations are a sample from 
a normal population will receive special consideration m this paper 

Investigation of the efficiency of the non-parametric estimates and tests will 
be limited to the situation where the n observations are a sample from a normal 
population. Three cases will be considered: 

(A) . Asymptotic efficiency of the non-parametric results as compared with 

the corresponding most efficient results based on the entire sample 
(population variance unknown). 

(B) . Asymptotic efficiency of the non-parametric results as compared with 

the corresponding most efficient results based on the pn + 0(v«) 
smallest order statistics for the situation where the variance of the nor¬ 
mal population is known. 

(C) . Asymptotic efficiency of the non-parametric results as compared with 

the corresponding most efficient results based on the /3» -f 0(\/n) 
smallest order statistics where 3 is slightly greater than p (population 
variance unknown). 

The definition of “asymptotic” efficiency together with some of its properties 
is given in Section 3. Only asymptotic efficiencies will be considered. 2 However, 
the efficiencies obtained for the asymptotic case would seem to represent lower 
bounds of the efficiencies for the corresponding non-asymptotic cases since ex¬ 
perience indicates that the efficiency of non-parametric results usually de¬ 
creases as the sample size increases, 

First let us consider case (A). From Theorem 3, the asymptotically most ef¬ 
ficient results for estimating or testing the 100p% population point on the basis 
of the entire sample (population variance unknown) are furnished by the non¬ 
central f-statistic. An expression for the asymptotic efficiency of the non-para¬ 
metric results as compared with the corresponding results based on the non¬ 
central ^statistic is given in the Corollary to Theorem 3. The reciprocal of this 
efficiency represents the factor by which the original number of starting items 
must be multiplied if the non-parametric results are to asymptotically furnish 
the same “information” as the non-central S-statistic applied to the original num¬ 
ber of starting items. Table 1 contains values of this factor. Although a larger 
number of starting items are used by the “information equivalent’ non-para- 
metnc results, a noticeably smaller number of items are failed. The factor by 
which the number of items failed is decreased equals the value of p multiplied 
by the factor by which the number of starting items was increased for the ‘ equiv- 


2 Some power function comparisons for the non-asymptotic case were given by Paul 
H. Jacobson in [1]. 



388 


JOHN E. WALSH 


alent” non-parametric result. Table 2 contains a list of some of the resulting 
factors. 

Next consider case (B). The first step in the analysis for this case consists in 
obtaining the asymptotically most efficient results. These derivations are con¬ 
tained in Theorems 4 and 5. The Corollary to Theorem 5 contains an expres¬ 
sion for the asymptotic efficiency of the non-parametric results for case (B). 
The factor by which the original number of starting items must be multiplied 
to obtain “information equivalent” non-parametric results is obtained in the 
same way as for case (A) Table 1 lists values of this factor. In this case both the 
number of starting items and the number of items failed are slightly increased 
by use of the “equivalent” non-parametric results. The factor by which the 
number of items failed is increased equals the corresponding factor for the in¬ 
crease in number of starting items. For convenience of reference, however, values 


TABLE 1 

Asymptotic ratio of total numbers of items tested 
(. Non-parametric test over most efficient test ) 


Case 

.01 

.02 

.05 

.10 

1 

20 

.30 

.40 

50 

.70 

(A) 

377% 





153% 

155% 

157% 


(B) 






114% 


128% 

164% 

(C) 

111% 

114% 

118% 

122%l 

129%l 


148% 




of this factor are also given in Table 2. If the variance of the normal population 
were unknown, the asymptotic efficiency of the non-parametric results would be 
at least as great as that obtained for case (B), and likely greater. 

Finally consider case (C). Let p be replaced by P in Theorem 5 while the value 
of P corresponding to a given value of p is defined by the relation in Theorem 
6. By suitable choices for the values of j3 and 0(\/n) in Theorem 5, it is possible 
to obtain asymptotically most efficient results for the population 100p% point 
when the population variance is unknown and only the (3 n + 0{-\/n) smallest 
values of the sample are available. These results are presented m Theorem 6. 
The Corollary to Theorem 6 contains an expression for the asymptotic efficiency 
of the non-parametric results as compared with the corresponding results of 
Theorem 6. The factor by which the number of starting items must be increased 
to obtain “equivalent” non-parametric results is computed as in cases (A) and 
(B). Table 1 contains values of this factor. The value of p represents the fraction 
of starting items which are failed if the estimates and tests of Theorem 6 are 
used. Table 2 contains corresponding values of p for certain values of p. The 
factor by which the number of items failed is decreased equals p/P times the 











SOME ESTIMATES AND TESTS 


389 


factor by which the number of starting items was increased to obtain the “equiv¬ 
alent” non-parametrie results. Table 2 presents values of this factor. 

The results of Theorem 6 furnish an asymptotically efficient method of esti¬ 
mating and testing the smaller population percentage points while only failing 
a small percentage of the starting items (for the case of nonnality). Since a larger 
number of items are failed and much more work is required for computing the 
necessary statistics, however, this method is not necessarily preferable to the 
non-parametric method from the viewpoint of '‘information" per unit cost. In 
many cases the difference in cost will be slight. Since the non-parametric results 
are valid under much more general conditions, they would seem to be preferable 
for these cases. 


TABLE 2 


Asymptotic ratio of numbers of items failed 
( Non-parametric test over most efficient test) 


p 

.0113 

0234 

.0612 

.130 

.287 

.476 

.70 


Case 

01 

02 

.06 

.10 

20 

30 

.40 

50 

(A) 

3.77% 

5.40% 

9.50% 

16.0% 

30.2% 

. 

45.9% 

62.0% 

78.5% 

(B) 




105% 

109% 

114% 

120% 

128% 

(C) 

99% 

98% 

96% 

94% 

90% 

88% 

85% 



3. Definition of asymptotic efficiency. In this section the n observations are 
assumed to be a sample from a normal population Let the 100p% point of the 
population be denoted by 6 P . Several classes of results for investigating 6 P are 
considered m this paper. For example, the non-parametric estimates and tests 
represent one class; the asymptotically most efficient results based on the entire 
sample (population variance unknown) represent another class; etc. The results 
considered consist of point estimates of 6 t , confidence intervals for 6 P , and sig¬ 
nificance tests for 8 P based on these confidence intervals. For a specified class, 
every point estimate and every endpoint of a confidence interval (a one-sided 
confidence interval has only one endpoint) consists of some statistic T whose 
variance is of the form <r\/n + 0 ( 1 / 11 ) for large n. Here 0 % is independent of n 
and haB the same value for all statistics T of the class. Also for every such statis¬ 
tic T the quantity 

\/n(T — 9p)/ar 

has a distribution which is asymptotically normal with unit variance and some 
finite mean A which is independent of the unknown parameters of the normal 
population. By suitable choice of T, the mean A can be made to have any speci¬ 
fied value. 






390 


JOHN E. WALSH 


Now let us define the asymptotic efficiency of the class of non-parametric 
results as compared to a class of results of the type defined by (A), (B) or (C). 
Let the non-parametric results be based on n sample values while the other class 
of results is based on m sample values. Let the common value of <j\ for the non- 
parametric results be denoted by a ? while the common value of this quantity for 
the other class is denoted by c\ . If a\/n = <r\/m when m = nE, then the asymp¬ 
totic efficiency of the non-parametric results (compared to the specified class 
of results) is defined to be 100 E%. For the situations considered in this paper, 
E is independent of n ( m and the parameters of the normal population. 

Asymptotic efficiency, as defined in the preceding paragraph, has the property 
that the statistic (or statistics) yielded by a non-parametric result based on n 
sample values has approximately the same distribution as the corresponding 
statistic (or statistics) based on m sample values from the specified class if m = nE 
(n large). For example, consider a non-parametric unbiased estimate 2\ of 0„ 
based on n sample values and an unbiased estimate Tz of 0 P from the specified 
class based on m sample values. Then, if m = nE, the distributions of 

Vn(Ti — 9,)/<ri, y/n(Ti — 0„)/<ri 

are asymptotically identical (note that <r?/n = Similarly for the end¬ 

points of confidence intervals Consequently the power functions of significance 
tests based on corresponding confidence intervals are asymptotically identical 
if m - nE. it would therefore appear that the definition chosen for asymptotic 
efficiency is suitable for the situations to which it is applied. 

4. Notation. In this paper <(1), • ■ • , t(n) will represent the values of the set 
of all n observations arranged in increasing order of magnitude. Then 

K 1 ), • • • > t(r) 

are the r smallest values of the set of n observations. The notation <(r) has mean¬ 
ing only if r is an integer such that 1 ^ r ^ n. Often, however, expressions of 
the form t[jm -f 0(\/a)] will be encountered. In what follows, an expression of 
the form <(z) has the interpretation i (largest integer gz). For example, 

<(487*) = <(487). 

Also the r = pn + O(Vn) smallest observations are frequently referred to, 
here r is interpreted to be the largest integer contained in pn + 0(\/ n ); etc. 

6. Theorems and derivations. First let us consider some well known estimates 
and tests of the population percentage points which are based on statistics of 
the type used for the sign test. These estimates and tests are valid under ex¬ 
tremely general conditions. It is not necessary that the observations be drawn 
from the same population or even that any two observations come from the 
same population. Population percentage points are not necessarily unique. The 
strongest continuity restriction imposed is that the population cdf be continuous 
at the percentage point considered. These results follow from 



SOME ESTIMATES AND TfflBT g 


391 


Theorem 1. Let 4(1)> '' _> represent the values of n observations arranged 
in increasing order of magnitude. The n observations are statistically independent 
and from populations which satisfy the conditions: 

(I). The populations have at least one 100p% point in common. 

(II) If the populations have only one common 100p% point, the cdf of each 
population is continuous at that point , 

Let 6 P denote the value of the common 100p% point if it is unique, or the open in¬ 
terval of common 100 p% points otherwise (i.e., the interval of common 100p% points 
with its endpoints deleted), Then asymptotically (n —> ®) 

(i) . t(pn) is a medi an estimate of 9 P . 

(ii) . Pr [t[pn + np( 1 — p)l < = Pr(%n, + K a \/np{ 1 — p)] S 9 P ] 

= a, 

where K„ is the standardized normal demate exceeded with probability a. Relations 
(i) and (ii) are approximately satisfied if pn > 5 and p 
Proof. This theorem is a direct application of the binomial theorem Condi¬ 
tions (I) and (II) assure that the equality between the probabilities in (ii) 
holds. Relations (i) and (ii) are obtained by using the normal approximation to 
the binomial theorem; this approximation is reasonably accurate if pn > 5 and 
p S \ (see [2]). 

The non-parametric confidence intervals investigated are of the forms 

i[ P 7j, + BiVn + o(vV>] < 9 v t tfan + + o(Vn)l > > 

i[pn + BiVn + o(Vw)] < s i > < ^ + BiVs + o(Vn)] (Ri < Bf), 

(these intervals have the same confidence coefficient if < is replaced by ^ and 
> by 5:). The significance tests considered are those obtained from these con¬ 
fidence intervals while the point estimates of 9 P are based on single order statis¬ 
tics of the form t[pn + B\/n + o(Vn)L 
When 0 P is an open interval, (i) and (ii) need interpretation. The meaning of 
(i) is that the probability of t(pn) exceeding every value of 6 V has the value \ 
and that the probability of it being less t han all values of 6 P also has the value i 
The inequality t{pn + K«Vnp(l - ?)1 ^ has th e interpreta ton that every 
value of 9 P i s greater th an or equal to t[pn + K«Vnp[ 1 - p)l- Similarly for 
t[pn + K a y/ np(l — p)l < ■ 

The purpose in introducing the case where 6„ is an open interval was to point 
out that situations where population percentage points are not unique cause 
little difficulty if suitably interpreted. 

Non-parametric results of the type considered in Theorem 1 are also available 
when the sample size is not large. For any sample size n, if the conditions of 
Theorem 1 are satisfied, 

Pr[t(r) < 9,] =■ Pr W) i 0j = £ wl . T| p*(l - ?)""• 

«->r 8[(n — 8) I 

The probability relations in Theorem 1 were obtained by approximating this 
summation for large n By suitable choice of r, confidence intervals and signif- 



392 


JOHN a. WALSH 


icance tests with a wide range of satisfactory confidence coefficients and sig¬ 
nificance levels can usually be obtained for a given value of n. 

The above discussion emphasizes the generality of application of the non- 
parametric estimates and tests. For most practical situations, however, it is 
permissible to assume that the observations are a random sample from a popula¬ 
tion which has a probability density function that is non-zero over the range of 
definition and differentiable several times. Then asymptotically t(pn) is also a 
mean estimate of 9 P (which is now necessarily a single point). Moreover, the 
asymptotic distribution of t[pn + C\/n + o(Vn)] can be found in terms of p, 
C, 9 V and the value of the probability density function at 9 P . These results are 
a consequence of 

Theorem 2. Let the population from, which the n sample values were drawn have 
a pdf f(t) such that f(t) 0 over its range of definition and f'(t) exists and is con¬ 
tinuous in some neighborhood of t = 9 P . Then the variable 

Vn/p{ 1 - p)f(9 p )\t[pn + CVn + o(Vn)] ~ 9 P } 
has a distribution which approaches the normal distribution with mean 

CJ %/p(l - p) 

and unit variance as n —> °o. 

Proof. If pn is replaced by pn + CVn + o(Vn), the method used to prove 
this theorem is completely analogous to the proof presented on pp. 368-69 of [3]. 

Now let us consider the asymptotically most efficient results for estimating 
and testing d P based on the entire set of observations for the case of a sample 
from a normal population (population variance unknown). 

Theorem 3. Let the n observations be a sample from a normal population ( un¬ 
known variance a 2 ). Asymptotically the most efficient point estimates, confidence 
intervals and significance tests for 9 P using all the observations are those based on 
the non-central t-statistic. The value of <r\ (see Section 3) for these results based on 
the non-central t-statistic is tr 2 (l + K p / 2). 

Corollary. For case (d) the asymptotic efficiency of the non-parametric results 
equals 

100(1 + Xp/2)/2irp(l - p) exp (K 2 P ) %. 

Proof. The maximum likelihood estimate of 9 P based on all n sample values is 

(1) l -± m - -i£t(i)J/(n- 1). 

This quantity is equivalent to the non-central f-statistic, as can be seen by 
multiplying and dividing [(1) — 9 P ] by 

[m 

From maximum likelihood theory, (1) is an efficient estimate of 9 P . Asymp- 



SOME ESTIMATES AND TESTS 


393 


totically (n —» °°) the variance of (1) is of the form 

ff2 (l + K/2 )fn + o(l/n), 

and it is easily seen that the variance of an endpoint of a confidence interval for 
6 P based on the non-central f-statistic is also of this form. The corollary follows 
from combining Theorem 2 with Theorem 3. 

Next let us investigate the situation where only the r = pn + 0(\/n) smallest 
values of a sample of size n from a normal population with mean p and variance 
a, denoted by N(p, <Y), are available. First let us consider the asymptotic dis¬ 
tribution of 

r 

t(i) + 2 a p (n 
_1 _ 

r + 2 a p (n 

where 

a p = K p /2\/2 tt (1 - p) exp Q K 2 ^ + l/4x(l - pY exp (K\), 
b T = l/\/2ir(l ~ V) exp 

This distribution is given by _ 

Theorem 4. Let 4(1), • ■ , t(r) be the r = pn + 0{-\fn) smallest values (ar¬ 
ranged in increasing order of magnitude ) of a sample of size nfrom N(g, a). Then 
asymptotically (n —* <») the distribution of (2,f is N (0,1). 

Corollary. Let r = pn 4- C\/w + o(\/n). Then as n increases the distribution 

of 

? + 2ap{ - n ~ , (1 ~ p)(ft, + 2a p Kp) ^ / g 

r + 2 a v (n — r) ** p + 2o^(l — p) _ / V? + 2o P (n — r) 

approaches the normal distribution with unit variance and mean 

C(b P + 2o P K p )/[p + 2o,(l - p)] w . 

Proof. The proof of this theorem is long and will be deferred to section 6 of 
the paper. 

If the value of o- is known, the Corollary to Theorem 4 can be used to obtain 
point estimates, confidence intervals and significance tests for any population 
percentage point (including p). The resulting estimates and tests are asymptot¬ 
ically most efficient. This follows from 
Theorem 5. Consider the r — pn + 0('\/n) smallest values of a sample of size 




394 


JOHN B. WALSH 


n from N(u, a) where a is known. Asymptotically (n —»■ oo) the variance of every 
unbiased estimate of y based on only t( 1), ■ ■ , t(r) and o' is greater than or equal 
to a quantity of the form 

a/n[p 4- 2a,(1 - p)] + o(l/n). 


Corollary. For case ( B) the asymptotic efficiency of the non-parametric results is 


100 


exp (~K\) 
|_2irp(l - p) 



%■ 


Proof. The proof of this theorem is similar to the proof presented for The¬ 
orem 4 and will be given in section 6 following the proof of Theorem 4. 

Let p be replaced by /3 in Theorem 4. Even if a is unknown asymptotically 
most efficient estimates and tests can be obtained for the 100p% point of the 
population if /3 is defined by 

(3) #,= (!- /3)(b, + 2 atKp)/\fi + 2a,(1 - /3)]. 


Theorem 6. Let p, (0 < p < |), be given and /9 defined by (3). Let f(l), ■ ■ ■ , t(r) 
be the r — (in + Cs/n + o(\4) smallest values of a sample of size n from a normal 
population. Then asymptotically 


2 t(i) + 2a, (n - r)t(r) 


f V + 2a, (n - r)] < 

1 y-CCi-,+2a,ir,)/[,+2a,(l-,))>/a 

V2tT J—oo 




dx. 


Corollary. For case (C) the asymptotic efficiency of the non-parametric results is 


100 


exp (-iCp) 
|_2irp(l - p) 



Kp exp 


(-H 


V2t r 


+ 


exp (-Kp) 
2»(1 - /3)/J 


%• 


Proof. Theorem 6 is an immediate consequence of relation (3) and the Corol¬ 
lary to Theorem 4. The Corollary to Theorem 6 follows from Theorem 2 and 
Theorem 6. 


6. Long proofs. This section contains the long proof of Theorem 4 and the 
related proof of Theorem 5, 

6.1. Proof of Theorem 4- If t(r) is such that 

u — K p a — n _4/10 g t(r) £ M - K r <r + n -4 ' 10 , 

the ratio of the value of the joint probability density function / of f(l), • • • , t(r) 
to the value of the function 


( 4 ) 


»Kl - vY~ r ( i Y _ / l ^ 

r«(f) - mT 

(n - r) 1 Wto) P \ 2 ? 

L . J 

~(n~ r)a + K f 

' - (n - r)b [ J(r) ” M + X,]] 



SOME ESTIMATES AND TESTS 


395 


is of the form 1 + o(l). Here (and in the remainder of section 6 ) a = a T ,b = b P . 
Also, for large n and any positive e, the integral of / over the ranges of the 
1(1), • • ■ ,t(r - 1) and for t(r) between p - K p a - n~ i+t and p - K p a + n“ i+£ 
differs from unity by a quantity which is of the order o(l), i.e., a quantity which 
—>0 as n —» oo. 

Now consider the moment generating function of (2), i.e , E[e ( '‘}. In evaluat¬ 
ing this function of 8, let the range of integration of t(r), (i e., the range after the 
other variables have been integrated out), be subdivided into the five intervals 

— co to p — Da, p — Da to p — K v a — n _4,1 °, 

P — K t o — n~ 4/10 to M - K„a + n“ 4/10 , 
p — K p <r + n~ i!ia to p + Da, p + Da to oo. 

Here D is a positive constant which is independent of n and such that 

(i/£) n_r a/p) r - i [i/(i - v )] n ~ r < expf- g j - r ^ b + 

L y/r -f 2a(n — r) J 

for n sufficiently large and 

D> \ K f \ + n-^/c, 1 - N(D) = N(—D) < tr^/D, 

where 

m ~v%C •■** dy 

First let us consider the interval p — K p a — » -4/l0 to p — K p a + n -4 ' 10 . Using 

(4) in place of /, completing the square in the exponent, making the change of 
variable 

%(i) — t{i) — 9/y/r + 2a(n — r) (i = 1, • •* , r), 
integrating a;(l), • , x(r — 1) over their ranges and then x(r) over the interval 

p — K p a — n -4/1 ° — 8/y/r + 2a(n — r) to 

p — K P a ri~ ina — 9/y/r + 2 a(n — r ), 

an expression of the form 

(5) exp (072) + o(l) 

is obtained. From the above results, this expression differs from the correspond¬ 
ing integration of / by a term of order o(l); hence the contribution to the mgf 
for the interval considered is of the form (5). 

Next consider the interval p — K„a + n _4/I0 to p + Da. After 1(1), • ■ • , 
t(r — 1) have been integrated out, the integrand becomes 
ttl L[ t(r) ~ P _ 6 

(r — 1) i(m — r)! \ |_ <r y/r + 2a(n — r) 

20a(n - r) V t(r) - p 1 b8(n - r) 1 

y/r + 2 a(n — r) L a * J Vr + 2a(n — r) J 



( 6 ) 



396 


JOHN E, WALSH 


By writing {N[(t(r) — n)/<r — 6/y/r + 2 a(n — r)]] r 1 in the form 
[N ^ + o(l))/y/r + 2 a(n - r ) 

. N V2i exp 


and maximizing exp (20a(n — r)[t(r) — n]/ay/r + 2a(n — r )} with respect to 
t(r) m the specified interval, it is seen that the value of (6) is less than an expres¬ 
sion of the form 

W^ll 1 - w [^F +o(i) 

for n sufficiently large. Differentiation shows that (lV[]) r_1 {l — iV'[]} n-r is a 
decreasing function of f(r) in the specified interval if n is large enough. Also, if 
t(r) = n — K P <r + n~ in °, for large n the value of 


• ~ f Cr) - M 

<r 




- - p) 


(,n— r) 6/10 


is less than a constant which is less than unity. Thus the value of (6) is less than 
a quantity of the form 

F=W^"ryi exp ( - clnl/10) + o(1) ’ 

which in turn is less than an expression of the form 
CiVn exp (-CW 110 ) + o(l) 

for n sufficiently large. Thus the integral of (6) over the specified interval is of 
the order o(l). An analogous proof shows that the contribution to the mgf for 
the interval n — Da to n — K v a — n _4/1 ° is also of order o(l). 

Finally consider the interval ft + Du to #. For large n the integral of (6) 
over this interval is less than an expression of the form 


(7) "r-l)l(n-x)l L. < ”‘ P { “ 4< " " r) } m + “ H>; 

i.e., the contribution to the mgf for this interval is of the order o(l) since the 
coefficient of the integral is less than an expression of the form C y/n- The upper 
limit (7) was obtained by replacing 

N l[f(r) — n]/a — 9/y/r + 2 a(n — r )} by 1, 
i —*[!«_-] by 



SOME ESTIMATES AND TESTS 


397 


(1 /D) n ~ r (l/p) r_1 [1/(1 - p)] n_r exp [-fl(n - r) {b + 2 aK v )/\/r + 2a{n - r)] 

by 1. 


A similar type proof shows that the integral of (6) from - «> to jii - D<r is also 
of the order o(l). 

Thus the mgf of (2) is of the form (5) for large n and Theorem 4 is verified. 

6,2. Proof of Theorem 5. Let us consider a single sample value from the multi¬ 
variate population consisting of the r smallest order statistics of a sample of size 
n from N(n, a), where <r 2 is known. Then the variance of every unbiased estimate 
of fi based on this sample and the value of <r 2 is greater than or equal to the re¬ 
ciprocal of 


( 8 ) 



fdt( 1) • • • dt(r) 



d 2 log/ 

d/P 


fdt(X) • ■ ■ dt{r), 


where f is the joint pdf of the r smallest order statistics of a sample of size n 
from N(n, a). For proof of this statement see pp. 480-81 of [3]. In the lower part 
of (8) the variables <(1), • • • , t(r - 1) can be integrated out leaving an explicit 
function of t(r) to be integrated from — <® to °°. To evaluate this integral for 
large n, choose some large but fixed interval )i - Dir to p + as was done in 
the proof of Theorem 4. Using a method similar to that presented on pp. 368- 
69 of [3], the value of the integral for the interval p - D<r to ju + Da is found 
to be of the form 

n\p + 2o(l - p)]/a + o(n). 

A procedure analogous to that used in the latter part of section 6.1 shows that 
integration outside this interval yields an expression of order o(n). 


REFERENCES 

[1] Paul H Jacobson, “The relative power of three statistics," Jout . Am.tr. Slat. Am., 

Vol. 42 (1947), pp. 575-584 

[2] Paul G. Hoel, Introduction to Mathematical Statistics, John Wiley and Sons, 1947, p. 45 

[3] Harald Cramer, Mathematical Methods oj Statistics, Princeton Univ, Press. 1946. 



ON THE RELATIVE .EFFICIENCIES OF BAN ESTIMATES 1 
By Leo Katz 
Michigan State College 

1. Introduction. J. Ncymnn [3] defined BAN (best asymptotically normal) 
estimates as those functions of observed relative frequencies which i) are con¬ 
sistent, ii) are asymptotically normally distributed, iii) are asymptotically ef¬ 
ficient and iv) possess continuous partial derivatives with respect to each relative 
frequency. He suggested the following two problems, first, to determine the 
class of estimates which possess the above four properties and second, to investi¬ 
gate this class of estimates to see whether, and under what conditions, the use of 
some of them is preferable to the use of others. Neyman’s paper dealt with the 
first problem directly and with the second obliquely, With respect to the first 
problem, he showed that two types of x 2 -minimum estimates belong to the 
class of BAN estimates as do, obviously, maximum likelihood (ML) estimates. 
On the second problem, the x 2 -minimum estimates may be more easily computed 
than the corresponding ML estimates in many cases, the ease of computation 
being especially pronounced for the modified \ with observed, rather than ex¬ 
pected, relative frequencies in the denominators. The present paper contains 
some additional information regarding the relative merits of these estimates. 
For simplicity, we shall consider a random variable taking on values 

* - 0,1, 2, 3, • ■ ■ 

with probabilities p(x | 0i, 0 2 , • • ■ , B r ) depending on r parameters. In working 
with x 2 -minimum estimates, it is almost always necessary to truncate the prob¬ 
ability law, taking 

/(as) = p(» | 0i, 0a, ■ • • , 0r), x = 0,1, • • ■ , k - 1, and 
(LI) 

/(&) = 2 p(as | 0i, 02 , 0 r ). 

k 

The ML estimates are asymptotically efficient, i.e., have minimum variance, 
with respect to the probability law, p(x | 8), and the x estimates have the same 
property with respect to the truncated p. 1., f(x | 0). This suggests that the op¬ 
timum variances of the estimates of the parameters of the two in samples of N 
may differ and, further, that the minimum variance of the % estimates may de¬ 
pend essentially upon the choice of k. In the course of some unpublished work by 
Evelyn Fix and others in the Statistical Laboratory at the University of Cali¬ 
fornia on x estimation of the parameters of several different p. l.’s the same 
anomalous situation occurred repeatedly. When the observed data were fitted 

1 This paper was presented to a joint meeting of the American Mathematical Society 
and the Institute of Mathematical Statistics at Boulder, Colorado on September 1, 1949. 

398 



KELATIVE EFFICIENCIES OF ESTIMATES 


399 


by the truncated p. 1. with the estimated parameters, the fit appeared to be 
improved when, h was chosen smaller. This suggested that perhaps, contrary to 
intuition, it might be possible to improve the precision of estimation by choos¬ 
ing k smaller, within certain limits. This paper proves that this notion is false 
and that some other explanation of this phenomenon is needed. 


2. Relative efficiency. Crain4r [1] has shown, simultaneously with Rao [6], 
that under mild conditions of regularity, the variance of an unbiased estimate, 
6* = 9*(x i, x 2 , ■ ■ • , x„), of a single parameter, 9, where , x 2 , • • ■ , x» are 
the observed sample, satisfies the following inequality for fixed N: 


( 2 . 1 ) 


D\6*) > 


1 


NE 


8 log p(x) 
88 


2 l 


the lower bound being attained only by “efficient” statistics. We may take as 
a measure of the relative precision attainable in the estimation of the parameter 
of the truncated p. 1. (1.1) the ratio of the lower bounds (2.1) of variances of the 
estimates of the parameters of the original p. 1, p(x [ 6), and of the truncated 
p. l,,/(as|0), We define 


( 2 . 2 ) 


Rel. Efl. 


E 

8 log/(a:)T 


E 

d logp(x) T 

96 


In the case of functions depending on several parameters, p{x | 8 X , 0 3 , • • ,9 r ), 
and unbiased estimates, 6* , which are functions of the observed relative fre¬ 
quencies, with non-singular covariance matrix || L„ ||, Cramfir [1] showed that 
the fixed ellipsoid, 


(2.3) 


r r 

N StjUtj = t + 2, 

i-i j-i 


where 


S-j = e\ d log d Io s 
' L 9di dd, J ’ 

lies wholly within the concentration ellipsoid, 

(2.4) Z Z L'%tj = r + 2, 

i-i j~i 

where [| L u || = || L i} || -1 . The two ellipsoids coincide if and only if the 8* are 
joint efficient estimates of the 9,. Thus, the covariance matrix of a set of joint 
efficient estimates is || No,j || -1 . In this case, we may define separately the 
relative efficiency with respect to each of the parameters as in (2.2) or we may 
consider the set of estimates for one function to possess greater concentration 



400 


LEO KATZ 


than the set for the other function if the fixed ellipsoid (2.3) for the first lies 
wholly within the similar ellipsoid for the second. The latter will be the procedure 
we adopt in section 5. 


3. Estimation of a single parameter. With p(% \ 0) and /(x | 0) defined as in 
(1,1), form the difference 


(3.1) 


4>(k) = 



The regularity conditions under which the Cramdr-Eao inequality (2,1) holds 
involve existence of dp(x)/d6 for all x and absolute convergence of 


X dp ^ 

* 30 


Assuming we have a regular case of estimation in Cramer’s sense so that these 
conditions hold, we may write 


(3.2) 



1 

m 


~am T 
30 J ’ 


and, since df(k)/dd = X“ (3p(x)/30) by the second of the regularity conditions 
above and/(/c) = X" v( x ) by (Id), 


0.3) mm = i; p(*) 


— 1 

1 

u 

1 

Cb 

2 

ap(*n 

_y/p{x) 00 


i 

*|s 

Qj 

1_ 


By the Cauchy inequality, the right member of (3.3) is non-negative and, since 
f{k) > 0, it follows that <j>(k) S 0, with the sign of equality holding only when 
dp(x)/dd is proportional to p(x) for all x 2: ]c. In this event, p{x) = ir 8 e o(l) , where 
Ks is a constant depending on 0, Now, if g(x) is constant, p(x) is a rectangular 
p. 1. On the other hand, if g(x) is not constant, there are two cases which must 
be considered, namely: 

a) p{x) = Kse Q{x) , x ^ 0, and 

b) p(x) = pi(x | 0), 0 ^ x < a ^ fc, 

= Kte"™, x ^ o. 

In the first case, Ko = (Xx-oe° !l) ) -1 and is independent of 0, so that we do not 
have a case of estimation at all. In the second case, each p(x) for x ^ a is known 
a priori to within a multiplicative constant depending on 0 and, hence, no essen¬ 
tial information is lost in truncation. Thus, except in these trivial cases, the 
relative efficiency is less than unity. 

It then appears that, in every case of regular estimation, the variance of an 
efficient estimate of the parameter of the p. 1. p(x | 0) is less than the corre¬ 
sponding variance for the truncated p. 1. f(x | 0) and that, as an imm ediate 
consequence, the ML estimate in general is capable of greater precision than 



relative efficiencies of estimates 


401 


the ^-minimum estimate for fixed N, This is the result mentioned in the first 
paragraph of section 1. It should be pointed out that the regularity conditions 
for the Cramdr-Hao inequality are stringent enough to give this result. To com¬ 
plete the argument for estimation of a single parameter, form the function 

CO CO 

(3.4) i p(k) = p(fc) Y p(x) Y p(x)[0(/c) ~ <j>(k + 1)], 


where </>(fc) is defined by (3.1). Using (3.1) and (1.1), we may write 



Making use of (3.5), straightforward algebraic reduction of (3.4) gives 

(a® m = [ 3 -hp § » (I) " p(t) t s °’ 

the sign of equality holding again only for the p. l.’s discussed after (3.3). Since 
the first three factors in the right member of (3.4) are positive, it follows that 
<b{k) is a strictly decreasing function of k, ThuB, the variance of an efficient esti¬ 
mate of the parameter of a truncated p. 1, /(x), depends upon the choice of k 
and decreases in strictly monotone fashion to the variance of the original p. ., 
v (x) as limit. As a result, the anomalous situation mentioned in the second 
paragraph of section 1 does not arise through irregularity in the behavior of this 

variance. 

4. Poisson and binomial probability laws. The Poissonp. I, p(x )X) = e Y/x! 
gives immediately 


a log p(x) 


_ 1 
" X’ 


whence, from (2.1), we obtain the usual result that the variance d the ^ 
unbiased estimate of X is X/2V. The truncated p. 1. has^ log/(x)/aX (x/X) 
for x g (fc - 1), and (d log/(fc))/9X = p(k - 1 )/2-* PW- 
Thus, 2 

(4.2) E [-Js^wT = l [2 p(*) + - k ^ k - 1} ] + lv ^ f • 

L ax J x L o p(x) 


Writing P(k - 1) for Y° 1 P( x )> we obtain finally ’ 

. X[p(/c — 1)] 

(4,3) Eel. Eff. P oi..on(fc) = P(h -!) + (*“ “ 1} + l - P(k - 1) 



402 


LEO KATZ 


Values of p(k) and 1 — P(k - 1) are given directly in Molina's Tables [2] for 
integer values of k and X = .001 (.001) .01 (.01) ,3(.l) 15(1) 100, or may be 
obtained indirectly from Pearson’s Tables [4] of the incomplete r-function. In 
the classical example of a Poisson p. 1. quoted by von Bortkiewicz, relating to 
numbers of deaths due to kicks by horses in Prussian Army Corps, A = 200 
and the average number of deaths per corps-year is .61. Either % procedure would 
take k = 2 and A = .6, approximately. Using these values, we find that Rel. Eff. 
(k = 2 | X = .6) = .9508, i.e., the loss in efficiency incurred by using a x 2 esti¬ 
mate rather than a ML estimate is of the order of five per cent. 

The binomial p. 1. is given by p(x \ n, 9) = (^j 0*(1 - Q) n ~ z , x = 0, 1, • • ■ , n, 

where n is a known parameter and d is the parameter to be estimated from a sam¬ 
ple of N observations. We obtain directly E[(d log p(x))/dOf = n/(0(l - e)). 
Computing a similar quantity for the truncated p. 1. and making use of the nota¬ 
tions p(x) n ) - Q 0*(1 - ey- 1 and P(a; rc) = p(a; . we obtain, after 

some reduction, 


Rel. Eff.blnomUl(fc) = 


1 


(n - 1 )P(k - 3; n - 2) 


(4.4) 


4 ^- P{k — 2; n — 1) + nPQc — 1; n) 


n{P (k - 1; n) - P(k - 2; n - l)} 2 ' 

1 — P(k — 1; n) 

The form (4.4) is suitable for computation if tables, such as Pearson’s Tables 
[5], of the incomplete B-function are available covering a range up to the param¬ 
eter n. If such tables are not available (4.4) is inconvenient since it involves 
probabilities associated with three different binomial laws In this case we may 
use the relations 

P(a;n) - P(o - 1; n - 1) = (1 - 0)p(a; n - 1), 

( 4 - 5 ) via) n ) = — via - 1; n - 1) and 

ct 

pia; n ~ 1} = p (o - » - !) 

to obtain the alternative form 

Rel. Eff.buomni(fc) = P(k — 1; n — 1) -f- (nd — k)p(k — 1, n — 1) 

(4 ‘ 6) , ne{ 1 - 9)[pjk - l;n - l)] 2 

1 — P{k — 1; n — 1) 4- 6p(k - l;n - 1)’ 
which involves only the one binomial p. 1., p(x\n - 1, 6). 

As an example, consider the probability situation in which ten independent 



RELATIVE EFFICIENCIES OF ESTIMATES 


403 


trials are made, each with the same probability of success, 0, The number of 
successes in each set of ten trials is one observation. On the basis of N observa¬ 
tions, we are to estimate 6. We shall investigate the relative efficiencies when 
8 = .10. Taking n = 10 and 6 = .10 in (4.6) we compute the following table of 
relative efficiencies for different choices of k: 

Relative efficiencies of x estimates in the case of the binomial p. 1., n — 10, 8 = .10 


k 

Eel. Eff. 

2 

.8993 

3 

.9828 

4 

.9979 

5 

.9998 


It is obvious from the table that the loss in efficiency is not great when k ^ 3 
and, hence, the variances of the x 2 estimates are practically equal to the variance 
of the ML estimate. But, in ordinary practice, N, the number of sets of ten trials 
each, would have to be over 140 before k could be safely chosen as large as 
h = 3, and even k = 2 requires N ^ 38. Cases in which we seek to estimate 
parameters on the basis of about 100 observations are not rare, in the present 
instance, use of a x 2 estimate would produce about 11% greater variance than 
the use of a ML estimate. 

The two elementary examples considered in this section provide only very 
fragmentary evidence of the need for caution in employing x 2 -minimum esti¬ 
mates; much numerical work would have to be done to provide any reliable guide 
to the relative efficiency of such estimates. 


6. Estimation of two or more parameters. Consider the p. 1. p(x ( 8 h 0 2 , 
• • • , 0 r ), x = 0, 1, 2, • • • , with ellipsoid of concentration for a set of joint effi¬ 
cient estimates given by (2.3). The truncated p. 1. given by (1.1) has a corre¬ 
sponding ellipsoid of concentration 


(2.3') 


ZZs'uMi = r + 2, 


d log f{x) (Mog_/(g)~j ^y e 8 h 0 w, in this section, that the el- 

ddi ddj J 

_ v _._ / lies wholly within (2.3'), this is so if the left member of (2.3) is 

uniformly greater than the left member of (2 3'), for every choice of the U , i = 
1,2, • • ■ , r. Accordingly, we form the difference, 


with fi(j = E 
lipsoid (2.3) 


(5.1) 

Adopting the notations, 


Q{k ) = 2 Z («., - O*. U • 

»-i j-i 


»•<*>-1sr “ d 



404 


LEO KATZ 


we obtain by direct subtraction, 

r r r » 


(5.2) 


QO\ = V V V Pi(z)Vi( x ) _ fXk)fj(k) 

,_iL*-fc p(x) /(7c) J 


Utj. 


Equation. (5.2) is unchanged if the right member is written in the form 

(5.3) i m -1 i \i vm 


i—l 1 


If this latter is now written as 


_ K(k) „,u i fi(k)fj(k) p(x) 

Kk) p,w + /(fc) /(F)/J 


Ut,. 


(5.4) Q(fc) - E £/(*>) 

»-i j-i 


J_ V //ElM _ /•&)') / P;(a=) _ /,(&) 
j(k) hi \\p(x) f{k) J \p(x) f(k) 


p(x) 


Ut , 


it is evident that the expression in square brackets in the right hand member is 
precisely the mean value of the expression in curly brackets taken over the set 
x ^ k. If we denote by E {g(x )) the expected value of g{x) over the set x § k, 

we have 


(6.5) 


m -iim e\( v -t 4- f M) «.( p -rr 

.•-1 i-i x^k \\p(x) /(fc) J \p(x) 



Finally, since the (finite) sum of the expected values is equal to the expected 
value of the sum, we have, 


( 6 . 6 ) 


Q(k) = /(fc) E 


r p t (x) . /i(fc) 
\_p(x) f(k) J 


U 


Since/(fc) > 0, Q(k) | 0. We need only note that Q(k) = 0 only if the linear form 
in curly brackets in (5.6) is identically zero, i.e., if each coefficient of U vanishes. 
This can happen only in the trivial cases analogous to those described in Sec¬ 
tion 3. 

It has been shown that the ellipsoid of concentration Of a set of joint ef¬ 
ficient estimates of the parameters of a p. 1. lies wholly within the corresponding 
ellipsoid of the truncated p. 1. Therefore, the best procedure for estimating the 
parameters of a truncated p. 1. cannot attain the precision of an efficient pro¬ 
cedure for estimating those of the original p. 1, 

In order to complete the argument for the general case, we form the difference 


(5.7) 


Q(fc) - Q(fc + 1) = i i r 

i-1 1-1 L p(]c) f(k) 

_,_/.(*+ D/,(fc + 1)1, , 

Kk + 1) J U h 


Making use of the two relationships /(fc) = p(k) + /(fc -j- l) and //fc) = 



RELATIVE EFFICIENCIES OF ESTIMATES 


405 


p,(A) + Jt(h + 1), we have 



m - eft+1) - } -i r$ 

fw li-i L p(fc) /(fc ■f 1). 



The right member of (5,8) being positive except in the trivial eases, it is clear 
that Q(k) is a strictly monotone function of fc. 


6, Conclusions. It has been shown that the efficiency of ^- mini mum estimates, 
or any other estimates which involve computation in terms of a truncated p. 1., 
is necessarily less than the efficiency of corresponding ML or other estimates 
based on the original p. 1. and, further, that the efficiency increases with the 
pomt of truncation. This was established for estimates of a single parameter and, 
also, for joint estimates of several parameters, Examples given indicate that, 
in any case of regular estimation, use of x 2 -mimmum estimates rather than ML 
estimates should be accompanied by an investigation into the loss m efficiency, 
The author is indebted to Professor J, Neyman, who suggested the problem. 

REFERENCES 

[1] H. Cbam^ir, “Contributions to the theory of statistical estimation,” Skandinmsk 

Aktuarietiiskrift , Vol. 29 (1946), pp. 85-94, 

[2] E. C, Molina, Poisson’s Exponential Binomial Limit , Van Nostrand, 1942 

[3] J. Neyman, “Contribution to the theory of the x 1 test,” Proceedings of the Berkeley 

Symposium on Mathematical Statistics and Probability , University of Cali¬ 
fornia Press, 1949, pp. 239-273. 

[4] K. Pearson, Tables of the Incomplete T-funchon, Cambridge University Press, 1922. 

[5] K. Pearson, Tables of the Incomplete B-function , Cambridge University Press, 1934. 

[6] C. R, Rao, “Information and the accuracy attainable m the estimation of statistical 

parameters,” Bull Calcutta Math Soc, r Vol, 37 (1945), pp 81-91. 



UNBIASED ESTIMATES WITH MINIMUM VARIANCE 

By Charles Stein 
University of California, Berkeley 

Summary. Subject to certain restrictions, a characterization of unbiased 
estimates with minimum vaiiance is obtained. For two fairly broad classes 
of problems, solutions are given which are more readily applicable. These are 
used to obtain, such estimates in some particular cases. The applicability of 
the results to problems of sequential estimation is pointed out. The problem 
of unbiased estimation is not at present of much practical importance, but 
is of some theoretical interest and has been treated by many statisticians. Also, 
the method used in this paper may be applicable to other problems in statistics. 

1. Introduction. Let R be a space of points x, B an additive class of subsets C 
of R and p. a measure over B such that R can be represented as the union of a 
countable collection of elements of B each of which has finite q-measure. Let ft 
be a set called the parameter space and let X be a random variable distributed 
in accordance with the probability density function p(x \ 6) for some 6 « 0, 
so that for any C t B 

P[XtC\e] = f p(x | d) dy(x). 

Jo 

A measurable real-valued function/(a:) on R is called an unbiased estimate of the 
real-valued function g(6 ) on ft if, for every 6 til 

(1) E(f(X) 1 6) = J f(x)p(x | 9) dy{x) = g(9). 

The problem considered in this paper is that of finding an unbiased estimate 
f* of g which minimizes the variance at do . Since this variance is 

mm - g(e 0 )f 10o) 

( 2 ) = | [/(*) - g(9o)fp(x \ 9) dy(x) 

= J [f(x)?p(x | 9) dy(x) - f(x)p(x | 6o) . 

this problem is equivalent to minimizing 

( 3 ) j [f(.x)fp(x\6 0 ) dy(x) 

subject to (1). It will be convenient to introduce the measure 

(4) v(C) = f p(x | do) dy.(x) 

Jc 

406 



and the probability ratios 

(5) 


UNBIASED ESTIMATES 


407 


ir(x | 0) = 


p(x I 6) 
p(x i 0o) ' 


We suppose ir(x | 9) finite for almost all x, and all 9. When we say “for almost 
all x,” we mean “except for a set of ^-measure 0.” 

In most practical problems, the set R is a subset of some finite-dimensional 
Euclidean space and p is either ordinary Lebesgue measure or, in the case where 
R is countable, counting measure which makes the measure of a set the number 
of points it contains. An exception is the application to sequential analysis 
considered in section 3 below, m which R is a countable union of sets, each of 
which is a subset of a finite dimensional Euclidean space. For the basic notation 
and concepts of the theory of integration see Saks [2], Oh. I. 

We shall define 


( 6 ) 


A ( 01 , 6i) =s j ir{x { &i)ir(x I dv(x), 


and suppose 


(7) 


A (6, 6) < * for all 6. 


By Schwartz’s inequality this implies that A{6 X , 6 2 ) < » for all 6i , 0 2 . If (7) 
is not true then it may happen that there exists no unbiased estimate with 
minimum variance even though there exist unbiased estimates. Consider, for 
example, the case where 0 consists of two point, 0 and 1, and g{6) = 6, and 


p(* I 0) 

p(x | I) 


1 for 0 < x < 1 
0 otherwise 

^x7 i for 0 < x < 1 
0 otherwise 


and p is ordinary Lebesgue measure. It is clear that there exist unbiased estimates 
of 9 with arbitrarily small positive variance at 8 = 0 but there exists none with 0 
variance. 


2. The principal theorem. In accordance with the usual terminology we de¬ 
note by La the class of all measurable functions <j> such that 

(8) j [<p(x)f dv(x) < =o. 

Finally, G is the class of all functions \f/ expressible in the form 

(9) fp(fi') = J <p(x)tt(x 1 6) dv(x) with 4>eL 2 . 

Theorem 1. If ir(x | 9) is finite for oil 9 and almost all x, and (7) is satisfied , 
and there exists an unbiased estimate of g, then there exists an unbiased estimate 



408 


CHARLES STEIN 


f of g which minimizes (3). If f* has finite variance then any other unbiased esti¬ 
mate of g with minimum variance at 0 O is essentially equal to f*, that is, differs 
from f* only on a set of ^-measure 0. A function f is an unbiased estimate of g with 
minimum variance at 8 a if and only if there exists a real-valued functional T on G 
for which 

(10) TA(8, df) — g{B]) for all 0i e Q, 

(11) j ( 2 )7T (x | 8) dv(x) = J ,j>(x)f(x) dv(x) for all <j> e Li. 

(The 'preceding sentence does not assume the existence of an unbiased estimate of g.) 
The minimum variance is Tg(8) — [^(0a)] s . 

Proof. Let [/,) be a sequence of unbiased estimates of g such that 

lim [ [f,(x)} 2 dv(x) = gib. f [f(x)f dv(x) 

where / ranges over all unbiased estimates of g. Then by the weak compactness 
of every sphere in Li (see [1], p. 10) there exists/* t Li and an increasing sequence 
(n,-] of integers for which 

j <(>f* dv = lim f <t>f„ { dv for all <j> t Li. 

J »-*oc J 

Since t(x \ 8) t Li by (7), this implies that f* is an unbiased estimate of g. Also 

(12) / [/*1 2 dr < lim J f 2 t dv = g.l.b. J f dv. 

Thus f* is an unbiased estimate of g with minirmim variance. 

Let <t >i « Li be such that 

(1 3 ) j i(x)r(x | 8) dv(x) = 0 for all 0 e & 

Then, using the /* defined in the last paragraph, we obtain for any real e 

(14) 0 </(/* + e^) 2 dv - J [/*] 2 dv — 2e J frf* dv + e* J 4>\ dv 

since f* + &£i is an unbiased estimate of g. Dividing (14) by e and letting e —* 0 
we obtain 

d 5 ) / /* dv = 0 . 

If a function in G can be represented in two ways, 

j <t>(x)ir(x 10) dv(x) = j <p'(x)t(x I 0) dv(x), 



UNBIASED ESTIMATES 


409 


and consequently cfo = <f>' — $> satisfies (13) and (15) Thus ( 11 ) defines a func¬ 
tional on G in a consistent way Also, this functional satisfies (10) since 

TA( 9 , 0 i) = T J ir(x | 8 ])t(x I 9 ) dv(x) 

= f r(x\ 0 ])f*(x) dv(x) = g( 9 i). 

By (2) and (11) the minimum variance is 

J lf*fa )] 2 dv(x) - [g( 6 o)f = T J f*(x)ir(x\&) dv{x) - [ff( 0 o )] 2 

= Tg(d) - [ ff (e„)] 2 . 

To prove the converse, let /* be any function in L 2 for which there exists a 
functional T satisfying (10) and (11) By (11) with = ir(x ] 61 ), 

J f*(x)r(x | 6 ,) dv(x) = T J t(x I | 9 ,) dv(x) 

= TA(8, 90 = g(90 

by ( 10 ), so that/* is an unbiased estimate of g. Any other unbiased estimate/ 
of g with finite variance at 6 0 is an element of Li Thus from (1) and ( 11 ) 
we obtain 


Tg(e) = f //* dv 
= / If*? dv. 

Applying Schwartz’s inequality to the middle expression we obtain 

/ [f*T dv< f [f? dv 

with strict inequality unless / is essentially equal to /*. 

Corollary 1. Suppose ir(z | 6 ) is finite for all 9 and almost all x and (7) holds. 
Let Hi (x, d) be the set of all 9 eQ such that ir(x | 9) > d, and let H be the smallest 
additive class containing all Hi(x, d). Suppose there exists an additive set function X 
over H such that there exists a finite collection of parameter points 8k and positive 
number c* such that 


(16) 

for almost all x, and 
(17) 


J ir(x | 6) | dX(0) | < c*,7r(z I 80 

j A(6, 90 d\(8) = g(80- 



410 


CHARLES STEIN 


Then the unbiased estimate of g(d) with minimum variance at 0 O is 

( 18 ) /*(*) = J t(x | 0 ) d\(e) 

and the minimum variance is 

( 19 ) / g(e) d\(8) ~ [g(e a )}\ 

Proof: We need only show that (10) and (11) are satisfied by 


= J +( 0 ) d\(B) 

and (18). But 

( 20 ) ta(o, fli) = f A(o, ef) d\(e) = g(ef) 


by (17) and 

T f <f>(x)%(x | d) dv(x) 

( 21 ) 


J d\(8) J 4>(x)tt(x | 8) dv(x) 

J <js(x) dv(x) J 7r(x | 8) d\(8) = J <j> (x)/* (x) dv(x). 


Since each of the functions 4>(x), ir(x | 0) considered as a function of x and 8 is 
measurable (BH), their product is also. The interchange of order of integration 
in (21) is justified by Fubini’s Theorem (Saks [2], p. 87) and (16) which by (9) 

implies that J | d\(8) | J 4>(x)tt(x | 6) dv(x) < co. The equations (20) and (21) 

are equivalent to (10) and (11) respectively. 

Corollary 2. Suppose ir(x | 6) is finite for all 6 and almost all x and (7) holds. 
Suppose also that O is a set of real numbers and: 

(i) for some m, either a positive integer or + go, 7r(x | 6) is, for almost all x, 
differentiable m times with respect to 8 at 6 = 8 0 , 

(li) for each n < m there exists a finite collection of parameter values 0 n ,* and 
positive constants c n ,k such that 


( 22 ) 


t/" 1 (x | 00 + 5) — 7T ln) (x j 0 O ) 
8 



Cn,K ir(x | 0 n ,k) 


for all 8 whose absolute value is sufficiently small and almost all x, 

(iii) there exist constants a n such that for all 0 a , 

(23) = IX , 

n~ 0 [_CW J 9 «fl 0 

(iv) there exists a finite collection of parameter values 8i c and positive constants 



UNBIASED ESTIMATES 


411 


c* such that 

m 

(24) E 

n“0 

Then the unbiased estimate of g(9) with minimum variance at 6 a is 



The minimum variance is 



(Xu 


d n ~ i 

— v(x\e) < E c k Tr(x\e k ). 

0V n «=«o * 


Proof. We need only show that the functional T defined by 

/m a “ 

| 9) dv(x) = E«nTT B <j}(x)ir{x | 6) dv(x) 

n=*Q OU n J Jtf-00 

satisfies (10) and (11) with /* given by (25). Equation (23) yields (11) immedi¬ 
ately. Also 

/ fH /« 

<t>(x)ir(x I e) dv(x) = E a n — / <p(x)tt(x I 6) dv{x) 

n=0 OU J _ 0=Bq 

= 2 O n J <t>(x) 7r(.r I 0) 

by (9), (i), (22) and Lebesgue’s Theorem on term by term integration (Saks 
[2] p. 29.). Using (24) and Lebesgue’s Theorem, we find that this is equal to 

f <t>(x)2 a n 4r *( x I e )~| <M*) - [ <p(. x )f*{x) dv{x). 

J d 9 n J«-e 0 J 


dv{x) 

0=^0 


which completes the proof. 

There is an obvious combination of Corollaries 1 and 2 which will not be 
stated explicitly. Also Corollary 2 can be extended to involve differentiation 
with respect to several parameters. It would be of considerable interest to 
obtain a characterization of all possible functionals T in terms of the usual 
operations such as integration and differentiation. Also, the methods used here 
should be applicable, with some modifications, to other problems of minimization 
subject to an infinite set of side conditions. 

Corollary 3. Suppose that subject to the condition of Theorem 1, for i — 
1, 2, f* are unbiased estimates of g t with minimum variance at So • Then fi -j- A is 
an unbiased estimate of g% fi- Qi with minimum variance at do ■ 

This follows immediately from (11) and (12) in Theorem 1. Actually, the 
restriction to problems satisfying the conditions of Theorem 1 is unnecessary, 
but we shall not prove this here. 



412 


CHARLES STEIN 


3. Some special cases. We first consider a problem which is of little practical 
interest but serves well as an illustration of Corollary 1, Let X be a single obser¬ 
vation from a uniform distribution on the interval (0, 9 + 1), i.e. 

f i if e < x < e + l 
| e) = ] . 

0 otherwise. 


We suppose 8 lies in the interval ( —2V, iV — 1) where /V is a given positive 
integer, and take as the distribution for which the variance is to be minimized 


p(x | do) 


iit -N<x<N 
0 otherwise. 


This is the same as using the original p d.f. p(x | 6) with 8 a random variable 
taking on the values —N,—N+l, ■ • • , N — 1 with equal probability. The 
measure p is of course ordinary Lebesgue measure. Then 


(27) 

and 


\2N if0<x<0+l 

»(* I 0) = { 

0 otherwise 


0 if 0i < 0 3 - 1 

0i — 02 + 1 if 02 — 1 < 0i <! 02 

0J — 01 -j- 1 if 02 < 01 < 02 + 1 
0 if 0 2 + 1 < 0i. 

For —N<8 1 < iV — 1, equation (17) becomes 


(28) 2^A(0i , 0 2 ) 


(29) 


L 


max (-y. « t -l) 


and (18) becomes 


(0 - 0i + 1) dX(0) 

/.min (W—1. St+1) 

+ (01-0+1) d\(8) = g(9d/2N 

J h 


(30) f*(x)/2N = \(jam[N - 1, *]) - X(max[-2V, x - 1]). 

The reader will not be confused by the use of X as a point function here, and 
as a set function in Corollaiy 1. Using (30) and integration by parts (Saks [2], 
p. 102) we can rewrite (29) as 

f* 1+1 

(31) / /*( x) dx = g{8i), 


which is merely the condition that f* be an unbiased estimate of g. It is clear 
from (31) that g admits an unbiased estimate if and only if it is absolutely 



UNBIASED ESTIMATES 


413 


continuous. Differentiating (31) we obtain 

(32) f *(6 + 1) - f*( 8 ) = g( 8 ). 
Consequently the general solution of (31) is 

[«]+AT 

(33) /*(«) = Z o'ie - t) + 7 («), 

i-i 

where 7 is a function of period 1 such that 

(34) I 7 (9) dd = 0. 

Jo 


Here, contrary to the usual convention, [0] denotes the largest integer less than 8 . 
The one of (33) which minimizes the variance at 8 0 is determined by the condition 
that there exist X satisfying (30). Let y be any number on the half-closed interval 
( —N, —N + 1), and sum (30) for x = y, y + 1 ■ y + 2N — 1. This yields 

, 2 JV -—1 

(35) ~ § + 3) = W - 1) - X(-A0- 


Carrying out the same computation on (33) we obtain 
T 2 W—1 i+tr 

(36) 4 Z Z 9'(y + j - i) + y(y) = UN - l) - \(-N). 

Ary 0 »-i 


Combining (34) and (35) we find that the proper choice of 7 is that which gives 

[*]+tf+l 


f*{x) = Z ✓(* ~ M - N + *) 


,-Q 


(37) 


+ % (t -0 - w - * + 


-j 2AT-1 

+ 4 § i- 


If the limit of (37) as N —> » exists, it agrees with Norlund’s simplest definitions 
of the principal solution of (32) (see Milne-Thompson [3] formula (2) p. 201) 
whenever the latter is applicable. The author has not checked the agreement 
with Norlund’s more general definitions. 

Next we consider the problem of obtaining an unbiased estimate of g( 8 ) with 
minimum variance at do when X consists of n independent observations, each 
uniformly distributed over the interval (0, 8 ). Here 8 is an unknown positive 
number. The result is independent of the choice of 80 . Clearly a necessary and 
sufficient condition for the existence of an unbiased estimate of g is that g be 
absolutely continuous Corollary 1 can be applied to obtain as the best unbiased 

estimate g{Y) + - g' (7) where Y = max(Xi • ■ ■ X n ). However, this result can 

w 

be obtained much more simply by observing that, given any sufficient statistic Z, 



414 


CHARLES STEIN 


there exists an unbiased estimate with minimum variance which is a function 
only of 7j. A proof of this is given by Blackwell [4]. But Y is a sufficient statistic, 
and the condition that/*(F) be an unbiased estimate of g is that 

l jf f*(y)y n - 1 dy = g(0). 

This has as its unique solution that given above. 

A similar situation holds when the X t , i = 1 ■ ■■ n, are independently normally 
distributed with unknown common mean 0 and unit variance. Here Corollary 1 
is not applicable, but Corollaiy 2 is. The result can again be obtained more 
simply as the unique solution of the integral equation 

/ fdyT v '- w d, - „(») 

with 

1 U 

/*(* 1 >•••>£")= /o* (y), y = Y, X t . 

It should be observed that the methods of section 2 are applicable also to 
problems of sequential estimation. Let X x , X 2 , • • • be a sequence of real-valued 
random variables such that (Xi , ■ • , X„) have the joint p.d.f. p n ( xi , ■ • ■ , x„ \ 0) 
for some unknown 8 t Cl. Suppose it has been decided to terminate the procedure 
on the m tb observation if (Xi, • • • , X m ) t R m for some given sets R m in m space, 
and suppose these sets are so chosen that the probability of termination is 1 
for all 0. Then we can define the space R = U m iZ„, , the union of the R m , the 
measure 

m 

for any set A C R for which the intersections A n R m are Borel sets, where g. m is 
ordinary m-dimensional Lebesgue measure, and the probability density functions 

p(x | 0) = p m (xi • • • x m | 0) if *=(*!••• x m ) e R m . 

The previous results are then applicable. Most of the familiar results in the 
theory of statistical inference can be extended to sequential problems in the 
same way. Of course the interesting and difficult problems of sequential analysis 
are usually concerned chiefly with the appropriate choice of the regions R m . 

4. Connections with the work of other authors. Many lower bounds for the 
variance of an unbiased estimate were obtained by Bhattaeharyya [5], and 
some results were obtained earlier by others whose results are referred to by 
Bhattaeharyya. His work has been extended to sequential problems as indicated 
in section 3 above by G. R, Seth in a doctoral dissertation at Columbia Uni¬ 
versity. This leads to results analogous to, but in some respects more general 
than those of Wolfowitz [6]. Among other papers on sequential estimation, 



UNBIASED ESTIMATES 


415 


there are the one by Blackwell [4] already referred to, and the one by Girshick, 
Mosteller, and Savage [7]. These deal mainly with problems in which there is a 
unique unbiased estimate based on a sufficient statistic. 

The author is indebted to A. Wald, J L. Hodges, E. Baranltin, and H. Rubin 
for some helpful suggestions and comments, 

REFERENCES 

[1] B. v. Sz. Nagy, “Spektraldarstellung linearer Transformationen des Hilbertschen 
Raumes,” Ergebnisse der Mathemalih, Vol. 5, No. 5 (1942) 

[21 S. Saks, Theory of the Integral, Monografie Matematyczne, Tom VII, Warsaw, 1937. 

[3] L. M. Milne-Thompson, The Calculus of Finite Differences. Macmillan, London, 1933 

[4] D. Blackwell, “Conditional expectation and unbiased sequential estimation,” Annals 

of Math. Stat., Vol 18 (1947), p 105 

[5] A Bhattachabyya, “On some analogues of the amount of information and their use in 

statistical estimation," Sankhya , Vol. 8 (1946), p 1 and Vol. 8 (1947), p 201 

[6] J Wolfowitz, “The efficiency of sequential estimates and Wald’s equation for sequen¬ 

tial processes,” Annals of Math Slat , Vol 18 (1947), p. 215 

[7] M. GinsmcK, F. Mosteller, and L Savage, “Unbiased estimates for certain binomial 

sampling problems,” Annals of Math. Stat , Vol 17 (1946), p 13 


1 



DISTRIBUTION OF MAXIMUM AND MINIMUM FREQUENCIES IN A 
SAMPLE DRAWN FROM A MULTINOMIAL DISTRIBUTION 

By Robert E. Greenwood and Mark 0. Glasgow 

University of Texas 


1. Introduction. In this paper, the expected values 


■j-, m / \ 

E m j n w > n * > ' ‘' < n '> 


nj+A2+ * nft^N n\ !n 2 ! ■ • n k ! 


[mfn ,l2 > " ' ' n ‘ ) PiV •• 


■will be studied The quantities {n v }, i = 1, 2, ■ • • , k, are understood to be 
non-negative integers, and the quantities {p.) are non-negative probabilities, 
Spi = 1. Also, l k. Form (1.1) will be evaluated for the binomial case l = k 
= 2 and for the special trinomial case pi = p 2 with l = 2, 7c = 3. 


2. Binomial distribution. The evaluations for the expected values in the 
binomial case can be given explicitly in terms of the incomplete Beta function. 
This function may be defined by the relation 

(2,1) I ,(n - k, k + 1) = £ (1 - qYq ”- r , 

whence 

V, (k + 1, n - k) - £ (1 - S )' g"-'. 

r-*+l V/ 

It is seen that 


(2.2) I?(n — /c, /c —|— 1) -f- Ii—q(k -)- 1, n — 7c) = 1. 

For the binomial case, n 2 = N — ni and p 2 = 1 — pi, and thus instead of 
(fti, ni) and (p,, p 2 ) one may use (n, N — n) and (p, 1 — p) without any sub¬ 
scripts and without sacrifice of clarity. This will be done in some instances in what 
follows. The evaluation of 

(2.3) B <», .*)]-£ (f) [X N - »>] - r)'" 

is slightly different for the two cases N odd and N even. 

For N odd, and foT the minimum form, the summation may be written in two 
parts, (a) and (b), 

(a) 0 n ^ ^ , 


416 



MAXIMUM AND MINIMUM FREQUENCIES 


417 


in. which range min (n, N — n) - n, and 

N + 1 


(b) 


§ n 1 AT, 


in which range min (ti, N — n) — N — n. In the (a) part summation one gets 

- n r V" - ,-■) 

In the (b) part summation one gets 

£ . (f) (W - n)p»(l - p)*"" = £ ~ 

n-(JM-l)/* V, 71 / (JV+l)/2 \ U / 

■m - V)p n ( 1 - = iV(l - p)7, (^±i, ^- 1 ). 

Similar algebraic manipulations, supplemented by symmetry, can be used to 
effect the evaluations tabulated below. 

For N odd there result the forms 

Minin (m, n 2 )] = 2Vp/i_„ ^ 

, /AT + 1 N - 1\ 

+ AT(1 - p)7„ 

Mmax (m, n 2 )] = iVp7 p ^y---, — 

, , r/1 w (N - 1 IV + l\ 

+ AT(1 - V )h-v[—— 

For TV even there result the forms 


(2.4) 


(2.5) 


Mmin («,. n,)] = N~ph- P (y f) + ^0- ~ p)M (y + 1, y - l), 

Mmax (nt, n 2 )] = N V I P (y y) + iV(l - p)/i- P (y - 1, y + i). 

For this simple binomial case, max {%i , n 2 ) + min (»u, n 2 ) = AT and linearity 
in the expected value operator used in (2.3) preserves this relation, so that one 
obtains 


(2.6) E [min (n x , n 2 )] + F[max (% , n 2 )] = A r . 

Thus (2 6) and (2 2) could have been used in evaluating some of the forms above, 
or can be used as a check on the evaluations,, 



418 


ROBERT E. GREENWOOD AND MARK 0. GLASGOW 


To compute the variance 

(2.7) Ax) = * [x = E[x i } _ [B[x]] * t 

it will be convenient to note that for the binomial case 

(2.8) tfraax = Oinin 

where 

«> 'o - s fe «■■ A - <- ■ "■>]}■ 

and where because of the non-negative character of n 1 and n 2 

* [fe (»■•»>)}’] -*[s «■-*]• 

To prove (2.8), note that for this binomial case 

(max(ni, 712 ) — J3[max(ni, 712 )]} 2 = {min(«i, n 2 ) — B[min(ni, n^)]} 2 , 

and thus each term for tw has its counterpart for <rLi„ when using the first part 
of (2.7) to compute these variances, and hence (2.8) must be true. 

Defining a as the common value, one gets 

O 2 2 ,2 

ZC ffm&x “r °inln 

(2.10) = £?[max (n\ , n 2 )] + £?[min (nl , n 2 )] — {2?[max (ni, « 2 )]) z 

— {i?[min (n t , n 2 )]) 2 . 

The value of the sum 

i?[max (nl, nl)] -f 2J[min (nl , n\)] 

is somewhat easier to obtain than that of either part. For, max ( 1 n\, n\) is one 
of the integers (nl , nl) and min (nl, n 2 ) is the other integer. Linearity in the 
expected value form then gives 

fi[max (n\ , nl)] + B[min (n\, n 2 )] = E[n + (N — n) 2 ] 

(2 ' U) = 2VV + 2Np(l - p) + N 2 ( 1 - p)\ 

a relation which is similar to (2.6). 

Likewise one gets 


( 2 . 12 ) 


{■E[max («i, n 2 )]) 2 + {E[ min («i, n 2 )]j 2 

= {B[max (ni, n*)] -+- F[min (n,, n 2 )]} 2 

—22J[max (rii, n i )]F[min (m , w 2 )] 
= N* — 2S[max (m, n 2 )]F[min (n 2 , ns)]. 



MAXIMUM AND MINIMUM FEEQUENCIES 


419 


Substituting the results of (2.11) and (2.12) into (2.10), and solving for <r 2 one 
gets 

<r 2 = E [max (nj,, 7i 2 )]F[mm (n t , n 2 )] - N(N - l)p(l - p) 

(2.13) = $[max (m , rh)]{N - E[ max (n t , n. 2 )]( - N(N - l)p(l - p) 

= E [min (ni, n 2 )]{lV - E[ min (% , n 2 )]} - N(N - l)p(l - p). 

If one desires, one can make independent evaluations of £I[max (nl , n 2 )] and 
E[ min (nl , 7i 2 )] and compute the variances from relation (2 9). Such evaluations 
bring into play the incomplete Beta functions at four different sets of values, 
with separate sets for N odd and N even Relations (2 13) seem preferable to 
this suggested “strong-arm” procedure. A proof of relation (2.8) by this means 
seems to be unduly algebraically complicated. 


3. Normal approximation to the binomial distribution. If numerical values for 
large N are desired (beyond the range of tabulated values of the incomplete 
Beta Function) an approximation based on the normal distribution may be used. 
Let 


(3.1) 


ni = iVpi + x, 

ih = N — ni = iV(l — pi) — x, 


where the subscripts may be dropped when not needed for clarity 
Then one has 


(3.2) * [£>•*■>' 


: jL. 


V 4(71 


■ exp Gwrbs) *• 


To evaluate the minimum approximation, note that there are two ranges 


(a) 


< x < ~ - Np, 


in which range min ( x + Np, AT(1 — p) — x) = x + Np, 

(b) ^ - Np < x < oo, 

in which range min ( x + Np, N(1 — p) — x) = N(1 — p) — x. Defining 




420 


EGBERT E. GREENWOOD AND MARK 0. GLASGOW 


a tabulated function, the integrations may be evaluated as 


J?[mm (711, 7ij)] = NpA ( M ) + N(l — p) [1 — A (M)\ 


(3.4) 



2Np(l — p) 

7T 


exp 


-N(l - 2p) a ~ 

8p(l - p) _ 


S[max (wi, n»)] = -V(l - p)ri (M) + Np[l - .4 (M)] 


+ I 2n p(i - p) c::p r -Nd - 2p) a ~ 

\ IT " *L 8p(l — p) J ’ 

where 

,, _ jy/2 - n v 

VNp(i - p) • 

Note also that (2.6) holds for these approximate evaluations 
For the variance, approximations (3.4) may be used in relations (2.13). Or, 
alternately, the variances may be computed by "strong-arm” methods usin g 
the definition (2.9). In this ease, using the averaging defined implicitly by (2.10) 
one gets the evaluation 


<r a S N\A(M)\1 - ,1(M)][1 - 2p] s + jVp(l - p) 


(3.5) 


+ 


JVC 1 - 2p)[l - 2A(M)] ^2fyp^_-_p) exp 


2Np(l - p) 


exp 


jV(1 - 

. 4p(l - 


2 p) 


n 


V ) J 


~ -N( 1 - 2p) a ' 

_ 8p(l - p) . 


It would seem preferable, to use relations (2 13) rather than the above, for that 
reason the evaluation of forms (2.9) have not been included here. 


4. Trinomial distributions. The form 


( 41 ) 


E 


max 

min 


(tti, n-t) 


y N I 

111+12+113“^ Rl ! 77-2 ! Us I 


max 

min 


(n i, n 2 ) pi 1 pV Pa 5 


may be approximated, for large N, by the bivariate normal distribution. Sup¬ 
pose two attributes P (and not P — P) and R (and not R = R) are being ob¬ 
served in a distribution Then the four possible outcomes of an experiment 
could be represented as the categories PR, PR, PR, PR with respective probabil¬ 
ities a, b, c, d\ a + b + c + d = 1. In such a situation, for large N, one may use 
a bivariate normal distribution as a limiting form of the above described bi¬ 
variate binomial distribution, or multinomial distribution with four categories 

If the probability of one category, say PR, is zero, the bivariate normal 
distribution can be regarded as a limiting form of a trinomial distribution. 

Indeed, defining 


(4 2) 


ni — N pi 

[Np ,(i - pO? ; 


fh — Np 2 
[N Pi a - p 2 )]* ’ 


x 2 - 



MAXIMUM AND MINIMUM FREQUENCIES 


421 


the bivariate normal distribution takes the form [1] 
1 ( 1 


( 4 -3) clF = 2ir(1 ITTji ex P 
where 


2(1 - r ) 


(a:? — 2 rx!X 2 + x\) > dx i dx 2 , 


— °5 < Xi,Xi < co, 


.(1 - 


Pi P2 


-|i 


Pl)(l ~ P2)J ' 

The expected values are then given approximately by 


(4.4) 


E 


max 

min 


(ni 


,«oi = r rr=>.*V- 

*1-05 “—oo _ 


For the special case pi = pi , evaluations have been made of ( n i ,n 2 )] 
by the authors. For the finite summation (4.1), powers of N less than the one- 
half power were neglected, and the values 

E[mm (ni, ttj)] = Np — > 

(4.5) " ( s 

F[max (ni, n 2 )] = Np + 

were obtained. 

For the integral case, again for Pi = p* = P and hence for r = —p /(1 — p), 
the evaluation proceeds as follows In virtue of (4.2) and (4 3) 


(4 6) 


Z7[min (ni, r^)] = Np + [iVp(l — p)]* [ f [mm ( x i > ^ 2 )] dF 

J—eo J -oo 

= Np + [Np{ 1 - p)]* f ( (min (zi - x 2 , 0)] c 

J—a6 00 


It is convenient to introduce a rotation of axes in order to evaluate integral 
(4 6). Indeed, rotation through 7r/4 radians will give 


(4.7) 

with 

(4.8) 

(4.9) 


Vi J/2_ 

Xl ~ V 2 V2 ’ 

2/1 , y* 

Xi \/2 + V2 ' 


x] + ■ , ? P — x&z + x\ 


- yl (rb) + yl Cw) ■ 


1 - V 

min (xi — Xz , 0) = mm ( — Pis/2 , 0), 
dfa , x 2 ) _ 1 



422 


ROBERT E. GREENWOOD AND MARK 0. GLASGOW 


Thus integral (4.6) becomes 


2?[min ( ni , n j)] 

= Np + ~ vY ~" 


L 1-2 V 

2 


'* i r r 

2ir i-aj J- a 


(4.11) 


exp [“ 5 (r=^ + yl min { ~ ViV ~ % 0) dyi 


dyt 


= Np + 


fiVp(l - p) a J* 1 p 

r_ i (i - p) 

L 2 1- 


L 2(1 - 2 v) 

■if ~ Vi exp 


2 V 


2/i - 


d — p) „ • 


2/2 dyt) dy 


As indicated above, it is convenient to consider the form as an iterated integral, 
and integrate first with respect to 7/2 ■ The evaluation of (4,11) presents no seri¬ 
ous difficulties, 


2£[min (ru, ns)] 



Np(l - p) 

2(1 - 2p) 


-i 


1 

7 r(l — p) 



(4.12) 


Likewise 




(1 ~ p) 

2(1 - 2p) 



dyi 


2?[max (rii , 112 )] = Np + 



Note that these values are the same as those obtained from the finite summation 
form (4.1), as given by (4.5). 

To evaluate the variance 


(413) 


a finite summation form similar to (4.1) or an integral form similar to (4.4) may 
be used. 

In case the integral form is used, it is convenient to introduce the variables 
Xi and x 2 as defined by (4.2), One then gets 

E [min (n\, nl)] = N*p* + Np( 1 - p) 


(4.14) 


■f?[min(*; + 2 xi ; xl + 2 [y^J ft)] 

= NV + Np( 1 — p) + Np(l — p) 


•E 




Oft - x 2 ); 0 




MAXIMUM AND MINIMUM FREQUENCIES 


423 


in -which one mtegratioi over the whole space has been carried out. Rotating 
axes as per (4.7) one gets 


(4.15) 


E [min (n \, n\)] = N'V -f Np( 1 - p) + 2Np(l - p) 




In evaluating this last expected value form, the region of integration may be con¬ 
sidered as a sum of separate regions. Over some regions the integrand is zero, 
in other regions the non-negative product 



is the integrand and this condition gives 



as the regions of integration with the non-negative product as integrand. 

Since the assumption that N is large has already been made, it is convenient 
to approximate further here and assume [2Np/ (1 — p)] 4 is large, and in particular 
to assume that integration from — [2 Np/ (1 — p)] 4 to + °o is equal to integra¬ 
tion from — oo to + oo for the integrand under consideration and for iterated 
integration with respect to the variable yi . 

Remark: An equivalent assumption is needed in the finite summation case 
when approximating (Np) 1 by the use of Stirling’s formula, 

Thus one gets (since one of the above regions of integration is to be neglected) 


E Tmin 


-»(#‘ + [r§^]); 0 }] 


(4.16) 


■ ex P 1 n/? £ V* ~ (1 o V*t ^ I d V' 


-1 


2tt( 1 - 2p)* 

_ _1_ /ApV 

1 — p \ TT / 


2(1 - 2p) ' 2 

£ (" + [r^,]') \.W=W " : ] dy 


Collecting results from (4.13), (4.15) and (4.16) one obtains 
(4.17) Cm In ^N V (l - P~-^j. 


By a similar procedure, one may compute also that 
(4.18) cLx^tfp(l- P-;)- 



424 


ROBERT E. GREENWOOD AND MARK 0. GLASGOW 


For this three category case, the proof used to obtain relation (2 8) is no longer 
applicable, yet the relation <Tmu = jLi still holds for the approximating rela¬ 
tions given above 

6. Conclusion, Since the normal distribution was used in some instances to 
obtain approximations for the binomial and multinomial distributions, many 
of the maximum and minimum relations stated as approximations for the multi¬ 
nomial are exact for the appropriate normal distribution. 

No convenient formulation was found for the general trinomial case (p. 
Pi , pj unequal) similar to relations (4.5), (4.17), and (4.18). 

As possible applications of the general solution of this problem, the referee 
has kindly supplied the, authors with a reference of Guttman [2], Sampling 
theory provided by the general solution to this problem could be used in connec¬ 
tion with Guttman’s reliability coefficient. 

REFERENCES 

[1] M. G Kendall, The Advanced Theory of Statistics, Vol I, 3rd edition, Charles Griffin 
and Co., 1947, p 133. 

[2J Louis Guttman, “The teat-retest reliability of qualitative data," Psychomclrika, Vol. 
11 (1940), pp 81-95 



DERIVATION OF A BROAD CLASS OF CONSISTENT ESTIMATES 

By R. C. Davis 

V. S, Naval Ordnance Test Station, Inyokern, California 

1. Summary. Given, a chance vector X with distribution function F(X, Or), 
where 0 r denotes the tree unknown parameter vector, a broad class of estimates 
of 0 T is derived which is shown to be identical with the class of all consistent 
estimates of 0 r . A sub-class is obtained each member of which has the following 
properties: a) Its construction depends upon the solution of an equation in¬ 
volving a single vector function of the parameter vector 6 and the members of 
a sequence {X n } of independent and identically distributed chance vectors; 
b ) the estimate so obtained converges almost certainly to 0 r , c.) it is a symmet¬ 
ric function of the members of the sequence {X„}. In order to obtain this sub¬ 
class it is postulated that a function of X and 8 exists (continuous in 8 for a 
certain neighborhood of the true parameter 6 r and existing for each X in a sub¬ 
set of the sample space) which satisfies a Lipschitz condition in 8. In particular 
if a density function /(X, 8 r ) exists satisfying certain conditions, the consistency 
of the maximum likelihood estimate can be established under regularity condi¬ 
tions quite different from those usually assumed [1] This is not to be interpreted 
as a weakening of the usual regularity conditions but rather as an extension of 
the class of consistent likelihood estimates obtained under the usual regularity 
conditions 

2. Introduction. The present work is the result of investigations into the 
following question posed by J. Neyman: What happens to the asymptotic 
properties of the maximum likelihood estimate of 8 T when the usual regularity 
conditions on F(X, 8) are relaxed? The consistency and efficiency of the esti¬ 
mate are the properties in question, and the present work arose from the ob¬ 
servation that consistency at least can be obtained under Conditions much dif¬ 
ferent than those usually assumed [1], The assumptions made below are exis¬ 
tential in nature, and no general methods are given for the actual construction 
of consistent estimates. As stated above, however, the results of this work can 
be used to widen the class of consistent maximum likelihood estimates established 
heretofore. Although simple upper and lower bounds for the variance of a con¬ 
sistent estimate are obtained, no answer is given to the question of determining 
the efficiency of such an estimate, In regard to consistent estimates, J. Neyman 
and E Scott have discussed recently [2] the need for a systematic method of 
obtaining consistent estimates. Wald has given necessary and sufficient condi¬ 
tions [3] for the existence of a uniformly consistent estimate of an unknown pa¬ 
rameter 0 when there exists a density function continuous jointly in all of its 
arguments, and it is assumed that the domain of each of the unknown parameters 
is a closed and bounded set. It is hoped that the class of consistent estimates 

425 



426 


R. G. DAVIS 


derived below will help shed some light on a general method for actually ob¬ 
taining such estimates. In this connection it is important to point out that if 
necessary and sufficient conditions were known for the existence and uniqueness 
of a fixed point for a transformation on E„ to B n , the weakest possible conditions 
could be expressed for the existence of consistent estimates obtained m the 
manner giyen below. It is surmised that the use of a Holder condition of order 
one as presented below is stronger than required. 

Let {X,), i — 1, 2, • • • , n, ■ - , be a sequence of chance vectors in which 
X, possesses the probability distribution function F,(X, 0) depending upon an 
unknown parameter vector 0. The vector X has components X,, i = 1,2, ■ ■ , s, 
where Xi is a chance variable, and 0 has components fly, j = 1, 2, • ■ , in. The 
problem is to obtain a function of the X,' which is a consistent estimate of 0 
We denote by E, the real Euclidean space of s dimensions and by E', a subset 
of E. excluding at most a set of probability measure zero. For convenience we 
use the symbol || fl || to denote the norm of 0, where 

II e || = (0? + el + •■• + fl*) l,a . 

We define in a similar manner the norm of any function which assumes values 
in E m . The following assumption is made: 

Assumption 1. There exists a point 0 O and a neighborhood W(0o, a) of 0 0 having 
radius a (a > 0) which contains the true parameter vector 0 r as an interior point 
and there exists an infinite sequence of functions (? n (Xi, X 2 , • • • , X„ ; 0), n = 
1,2, ■ ■ • , ad inf. on E, X E n to E m such that 

(a) for each n the equation 

G„(Xi, X 2 , • < ■ f X„ ; 0) =0 

has a unique solution 0 ■= 0*( Xi, X 2 , • • • , X„) in W (0o, o). (For the sake of 
brevity we usually write G„(X; 0) = G„(X 2 , X 2 , • • • , X„ ; 0).) 

(b) For every pair of values of 0;, 0 2 in W (do, a) and for someK with 0 < K < 1 

lim P{||G„(X, 0.) - (? n (X, 0 2 ) - (0! - to) II ZK ||6, - toll) - 1. 

n —»po 

(c) For every t > 0, 

lim P{1|<?„(X, e r )|| < ,] = 1. 

IHB 

3. A consistent estimate of 0 T . 

Theorem 3.1. The solution 6 = 0j(Xi, X 2 , • ■ ■ , X„) of the equation 
O n ( Xi, X 2 , ••• , X„ ; 0) = 0 

is a consistent estimate of 0t , providing G„(X; 0) satisfies Assumption 1. 

Proof: From Assumption lb it follows that given & > 0, we have for all 
n > N'(6), 

(3.1) P{(| G„(x, e T ) - (fi T - e:> II £.K II «r - e: in > l - |, 



CLASS OF CONSISTENT ESTIMATES 


427 


since <?„(X, 6*) = 0. It follows from (3.1) that for all n > N'(S), 


(3.2) 


(7 n (X, 0 r ) 
1 + K 


a or 


- e; || s 


<? n (X, «,) 
1 - K 


> 1 -i- 


From Assumption lc it follows that there exists N"(e, 5) such that n > N"(t, 5) 
implies 


(3.3) P{\\ G n (X, 6 r ) || < e (l - X)} >1-1 

u 


(3.2), (3.3), and a familiar formula in probability imply for all 

n > max [N'(S), N"(t, 5)], 

p{|| e T - e* || < e ) > l - s. 

It is noted that (3.2) characterizes the speed of convergence of the estimate 
0* . The following uniqueness property is noted: If a given sequence of functions 
G*n(Xi, Xa, ■ • ■ , X„ ; 0) satisfies Assumption 1, then to T is the unique parameter 
vector in W(too , a) which satisfies item c of Assumption 1. The proof of this remark 
is left to the reader. 

The following remark demonstrates the extreme generality of the class of 
consistent estimates obtained in the above manner: The set of estimates of the 
parameter vector 0 r obtained from the class of all sequences of functions 

Gn(X 1 , X 2 , • • , X n ; 0) 


satisfying Assumption 1 is identical with the set of all consistent estimates of the 
parameter vector to T . The proof of this remark is quite obvious and is left to the 
reader. 


4. Properties of a sub-class of consistent estimates. The question arises 
naturally concerning a general method for the construction of a sequence of 
functions G„(Xi, X 2 , • ■ , X n , 0) satisfying Assumption 1. The author knows 
of no general method. It is possible to describe a sub-class of the class of con¬ 
sistent estimates, the construction of which depends upon the existence of one 
function rather than a sequence of functions. This is possible by application 
of the strong law of large numbers, and in this way consistent estimates of the 
parameter vector are obtained which converge almost certainly to the true 
value 0 t Moreover it is clear that under certain conditions the function 

G n (Xi , Xj, , X„ ; tor) 

defined as in equation 4.1 below is an asymptotically m-variate normal variable 
Assumption 2. Let {X,}, 1 = 1,2, • • • , ft, • • , be a sequence of independently 
and identically distributed chance vectors with common distribution function F(X; 0), 
where 0 is again the unknown parameter vector. 

Assumption 3. There exists a function g(X, 6) on E, X E n to E m such that 
(a) for every X « E', and every distinct pair (0i, 0 2 ) in W (0o, a), 

|| g(X, - g(X, 0 2 ) - (ft: - 02 ) || ^ K || 0i - 021|, 



428 


R. C. DAVIS 


where 0 < K < 1 and || g(X, 0 O ) || < (1 — K)a. 

/ c6 

ff(X, 07 ') dFOL, Or) = 0. 

as 

We define the function G„(X, 6) as follows: 

(41) G.CM) = -EffCX^fl). 

7l> t<na] 

The following lemmas are required: 

Lemma 4.1. (x„(X, 0) as defined in (4.1) satisfies the conditions in Assumption 
3 with G„(X, 0) replacing g(X, 0). 

The proof is sufficiently obvious to be omitted 
Lemma 4.2 G„(X, 0 r ) —> 0 almost certainly as n —»■ i/ Assumptions 2 and 

3b hold 

Proof- Since Eg(X, , 0 r ) = 0, i = 1, 2, • • • , n, and the chance variables 
g(X,, Or) are independently and identically distributed, this follows immediately 
from a theorem due to Kolmogorov [5]. 

Theorem 4 t. If Assumptions 2 and 3 hold, then the equation (?„(X, 0) = 0 
has a unique solution 0 = et(Xi , X s , ■ • • , X„) in TF(O 0 , a), where 0* is a con¬ 
sistent estimate of d T and is moreover a symmetric function of the observation vec¬ 
tors Xi, x 2 , • ■ • , x n 

Proof. We obtain the solution 0* by the method of successive substitutions 
Define 

01 = Oo — (? rt (X, 0 0 ), • • , 0J+1 = 0 5 — G„(X, 0 a ). 

In view of Lemma 4 1 we can apply a well known existence theorem [4] in the 
theory of functions to prove that the sequence (0,} converges to a limit 0* which 
is also in TP(0o, a). The same theorem establishes the uniqueness of the solution 
m W(d 0 , a) This uniqueness property together with lemmas 4.1 and 4.2 estab¬ 
lish the fact that the sequence {G n (X, 0)j as defined in equation (4.1) satisfies 
Assumption 1 It follows immediately from Theorem 3 1 that 0* is a consistent 
estimate of Or We can, however, prove a stronger relationship 
Theorem 4.2 The estimate 0* defined in Theorem 4.1 converges almost certainly 

iO Or . 

Proof: From Lemma 4 2 we know that given any number e > 0, there exists 
an integer N(e) such that for all n > N(e) 

Pf||(? n (X,0r) I! < e(l - K)} = 1. 

From Assumption 3a and Lemma 4 1 we see that 

II Gn(X, 0r) - (Or - et) II ^ K || Or ~ 6* ||, 

since G n (X, e£) = 0 Then 

II G„(X, 0r)|| ^ (1 — K) || 0r- e:||. 



GLASS OF CONSISTENT ESTIMATES 


429 


Clearly the set of X <= E s for which j| 0 T - 6* || < «includes the set of X for 
which || G„(X, Or) || < <1 ~ K). 

Therefore, for n > N(e), 

Pill e r - ei || < t } ap{||G.(x J i,)|| < «(i - K)} = i, 

and the proof is completed, 

The uniqueness of the parameter value 6 r in the neighborhood tF(0 o , a) 
follows immediately from the remark succeeding Theorem 3.1 since Assumption 
1 is valid in Theorems 4.1 and 4.2. 

It is interesting to note that the application of a theorem in the theory 
of functions of a real variable gives the result that if the function g(X, 0) is 
continuous on a bounded and closed set in E, X E m and if we take for E[ a 
bounded and closed set, then 6*(Xi, X 2 , • • • , X„) is a continuous function of 
Xi , X 2 , ■ • •, X„ for X, « E, {i = 1, 2, ■ • , n). If we assume the continuity of 
g(X, 0) in X for each 6 in W(0o, a) the following remark demonstrates an inter¬ 
esting relationship concerning the uniqueness of the solution for 0 in the equa¬ 
tion EgifX, 0) = 0. If m addition to Assumption 3 we assume that g(X, 0) is 
continuous in X for every X m E, and every 0 in TF(0 0 , a) and if at least one 
of the components g l (X, 0), 1 g 1 •A m of the m-dimensional vector function 
g(X, 0) satisfies also a Lipschitz condition: 

|| 9 >(X, 60 - gfX, 0 2 ) - (0! - 0 2 ) || g K || 0! - 0 2 1| 

for every distinct pair 0!, 0 2 m W(Oo, a), then for all 0 in W (0 o , a), Or is the unique 
solution for 6 of the equation Eg(X, 0) = 0. 

The proof of this remark is left to the reader. 

6, Upper and lower bounds for the expected squared error of 0*(Xi, Xa, ■ 1 

X n ). Denote by gfX, 0), 1 = 1, 2, • • , m, the m components of the chance vector 
g(X, 0). We now make an additional assumption. 

Assumption 4. 

E\gfX, 0 t )qAX, 0 T )] = K 


exists for i = 1, 2, • ■ , m and j = 1, 2, ■ • , m 
It follows from Assumptions 2, 3b, and 4_and the Lindeberg-L6vy form of the 
Central Limit Theorem that the vector s/nG„{X, 0 T ) tends in probability to an 
wi-variate normal distribution with means zero and moment matrix (Xu). 

Now from Assumption 3a and Lemma 4.1 


(5.1) 


g| | 6 : - dr| |^lJ gn(x,|) 


x 



For convenience define 



430 


E. C. DAVIS 


We obtain then 

E\\On% 9r) 11 2 = 

n 

It follows then from equation (5.1) that 




6. The consistency of maximum likelihood estimates. The results of this 
paper can be used to extend the class of consistent maximum likelihood estimates 
established heretofore [l]. 1 Assume that F(X, 8) admits a density function 
/(X, 0) with the property 

£fj(X, 8) dX = £ | (X, 8) dX 

Then 

®[£taAM)]-o. 

The maximum likelihood estimate of is obtained by solving the equation 

^ In L(X, 8) = 0, 

where 

Mx,e) = n/(x„e). 

i-1 

If a sample Xi, X 2 , • • • , X„ is obtained as the result of n random independ¬ 
ent drawings from the distribution having the c.d.f. F(X, 8), the sample values 
will satisfy Assumption 2. Assumption 3b holds as assumed above If we assume 
also that the function d/d6 In /(X, 0) satisfies Assumption 3a, it follows directly 
from Theorem 4.2 that the maximum likelihood estimate converges almost 
certainly to the true parameter vector as the sample size approaches infinity. 

The author wishes to acknowledge his indebtedness and gratitude to Professor 
.Terzy Neyman for the many helpful suggestions made during the preparation 
of the paper. 

■REFERENCES 

[1] J, L. Doob, “Probability and statistics," Trans, Am. Malh. Soc., Vol. 38 (1934), p. 759. 

[2] J Neyman and Elizabeth L. Scott, "Consistent estimates based on partially con¬ 

sistent observations,” Economelnca, Vol 16 (1948), pp. 1-32 

[3] A., Wald, "Estimation of a parameter when the number of unknown parameters in¬ 

creases indefinitely with the number of observations," Annals of Math . Slat., 
Vol 19 (1948), pp. 220-227. 

1 Recently Wald [6] and Wolfowitz [7] have discussed the consistency of the maximum 
likelihood estimate from another approach than the one employed by Doob. 



CLASS OF CONSISTENT ESTIMATES 


431 


[4] L. M Graves, The Theory of Functions of Real Variables, McGraw-Hill Book Co , 1946, 

[5] A Kolmogoboff, Grundbegriffe der Wahrschemhchkeitsrechnung, Chelsea Publishing 

Co , 1946 

[6] A Wald, “Note on the consistency of the maximum likelihood estimate,’’ Annals of 

Math Stai , Vol. 20 (1949), pp. 595-600 

[7] J. Wolfowitz, “On Wald’s proof of the consistency of the maximum likelihood esti¬ 

mate,” Annals of Math. Stat,, Vol 20 (1949), pp. 601-602 



DISTRIBUTION OF THE SUM OF ROOTS OF A DETERMINANTAL 
EQUATION UNDER A CERTAIN CONDITION 

By D. N. Nanua 

University of North Carolina 

1. Summary. This paper is in continuation of the author's first two papers 

[1] and [2] In this paper a method is described by which it is possible to derive 
the distribution of the sum of roots of a certain determinantal equation under the 
condition that m = 0, This condition implies, when the results are applied to 
canonical correlations, that the numbeis of variates in the two sets differ by 
unity. The distributions for the sum of roots under this condition have been 
obtained for l - 2, 3 and 4 and are Riven m this paper This paper also derives 
the moments of these distributions. 

2. Introduction, The reader should refer to the first two papers of this series 
[1] and [2] for detailed explanation of the preliminaries essential for this paper. 

The distribution of any root of the determinantal equation, specified by its 
rank when the loots are arranged in a descending order of magnitude, was 
derived by the author [1] The distribution of the largest root was expressed as 

(1) Pr(8 1 < x) = C(l, m, n)Fi, m , n (x) -- const. (0, /, l - 1, ■ • , 1, x\ m, n). 

3. Method. Putting 0, = p,/n in R(l, m, n) as given in [1J and allowing n to 
tend to infinity, the distribution density reduces to 

R{1, m ) = const. Up ? ]I (Pi ~ P,)e~ Xp ' (0 < Pl < pm < • ■ • < Pl < °o), 

»<7 

where the constant is independent of n, by [2], If we replace x by x/n in the 
right-hand side of (1) and allow n to tend to infinity, then the resulting function 
Gi, m (x) is independent of n and it can be shown by comparing the two methods 
A and B in [2], that 

(2) f R(l, to)II dp, = Gi, m (x). 

This is a constant multiple of 

(3) m) = f Tip? II ( P) . - p)e~ Zf ' n d Pl 

■ • <Pl<i i <i 

= const. x l+lm+l( ‘~ l)/2 0(x, m ). 

Putting p, = xyi , we have 

W [ ny? H (y, — Iij)e~ xZyi II dy % = const. 9(x, m) 

Vo <t/i <]/i _1 <; <V!<1 »<* 


432 



SUM OF ROOTS 


433 


The left-hand side is proportional to the moment generating function for the 
sum of roots when n = 0. 

Let yi = 1 — 6i, y t = 1 — , ■ • ■ , y t = 1 — Q y ; then (4) gives 

(5) [ n(l — 0,) m 17 (6, — 0j)e -!l+lS8 ' II dd, = const. 8(x, m). 

Let m be changed to n and both sides be multiplied by e lx , then we get 

(6) f n(l - 8,) n II (0* ~ 8,)e™' n dd, = const e lx d(x, n ). 

Jo <6i <6|„n< ■ • ■ <4 j< 1 »<j 

The left-hand side of (6) is the moment generating function for the sum of roots 
when m = 0. 

The method for obtaining the probability distributions is described in detail 
for each of the cases l = 2,3, in the following sections. 

It may, however, be added here that the condition m = 0, implies that 
| p - g | = 1 in the ease of canonical correlations. It also implies, in generalized 
analysis of variance, that if we have K samples and measurements are made on p 
characters then K — 1 and p should differ by unity. Thus the distribution is 
given for 5 samples and 3 characters when l = 3 (p = 3). 


4. Distribution of the sum of roots when m = 0. 

(a) l = 2. The value of Gi, m (x) has been given in [2] as 


(7) 


= fc(2, m) 


2 f u‘ 

Jo 


2m+1 e -Zu du - x m+1 e~ 


J 1 u" e “ dwj, 


where K(2, m) = 2 im+1 /Y(2m + 2). Then in the notation just given 
4>{x, m) = 2 £ u lm+l <T 2 “ du - x m+1 e~ x £ u du. 
Replacing u by xu, we get 


4>(x, m ) = 2x 


f 


. 2m+2 I -iru du _ j.Zm+2 - 


I' 

Jo 


„ 771 —XU J 

u e du 


( 8 ) 


Hence 


x 


2 m +2 -1 2m*1-Z — X r-l 

' - f e~ 2xu d(u 2m+i ) - —^7- f <T“ d(u m+1 ) 

1 Jo m + 1 Jo 

2ra+3 r r 1 

£_ 2 f u lm+! e~ 2zu du - e- 1 f u m+1 du 

i + 1 L Jo 


2m+2 —x *1 


m + 


m + 


8{x, m) = const. J^2 jf u 2m+2 e 2iu du — e 1 jf w m+1 e dwj, 



434= 


D. N. NA.NDA 


and according to (6), 


f n(i 

- 0i) n {Bi - 

S,)e xX> ' de i dO 2 






(9) 

const. e 2T |^2 

f u tnn e~ 2xu du - c- 1 

Jo 

f *~ zu J 

u e du 
*0 

= 

const. J^2 j 1 

(1 - u) u *e itu du - J 

1 (1 - u) n+1 e xu 

by replacing u by 1 

— u. Or, 



E(e xXS< ) = const. [2 f (I - 
- ■'0 

- u) 2n+1 e bu du - f (1 

Jo 

- u) n+i e xu du 


The constant can be evaluated by putting x = 0. 

Then let P r (Bi + Oi < Z) = const. [Fi(Z) + F%{Z)\, where F y (Z) and F 2 (Z) are 
cumulative distribution functions given by integrating the density (1 — n) 2n+2 of 
2u and (1 — u) n+l of u, respectively. It is easily seen that 

F»(Z) = f (1 - u) n+l du =[1 — (I — Z) n+i ]/(n +2) (Z < 1). 

Jo 

Since F X (Z) is to be obtained from the density of 2 u, we may substitute v = 2u 
and then integrate. Thus 

Fi(Z) =2 j| (l - V -J dv /2 - 2[I - (1 - Z/2) 2n+J ]/(2n + 3) (Z < 2). 

Hence the result for f = 2 is 

Pr(0! + < Z) = 2(n + 2)[1 - (1 - Z/ 2) Zn+3 ] - (2n + 3)[I - (1 - Z) n+1 ] 

(0 < Z < 1), 

= 2(n + 2)[1 - (1 - Z/ 2) 2n+3 ] - (2n + 3) (1 < Z < 2). 

(b) l = 3. The value of Gs, m (x) as given in [2] is changed as 

G».„(x) = K(3, to ) 1 2 jT u im+i e~ 2a du JJ u m e~ u du - 2 jT n m+1 e" u du 

J r X r I>1 

' u 2m+ V 2u du - 5-—^- 2z 2m+a / u 2m+2 e _2iu du - s 2 "^^ 

O' Wtl L •'0 

. jf.-.-*.]}. 

using (8), K{ 3, to) is a constant independent of, n. Putting xu for u in only the 
first two terms of the right-hand side of the above equation, we get 



SUM OF HOOTS 


435 


<?«,-( x) = 7c(3, rn)x 3m+5 12 jf 1 <f 2 *“ du £ u m e^ u du 

-2 f u Ia+2 e~ 2xu du f 1 u n+1 e~ xu du - f u 2m+2 

Jo Jo m + 1 Jo 


u 2m+i e~ 2xu du 


- t u m+1 e~* u du 
1 Jo 


By integrating by parts we get z 3m+6 as a common factor on the right-hand 
side of the above equation. Then according to (5) and (6) we have 

[ Ity? II (yi - Vi) e ~ xXv ' U dy> - const. ( 2 (m + 2) 

•M) <V8<U2<V1<1 l<3 l 


u lm+3 e- 2iu du 


f u m+1 e~ xu du + 2 (2m + 3)e“* f 
Jo Jo 


u 2m+i e- ix “ du 


- 4 (m + 2)e - * £ u im+ * e~ 2xu du + e" 2r jf * u m+2 e~ xu du j. 

Putting yi = 1 — 0a, 2/2 = 1 — 0s, ya = 1 — 0i and, changing in to n and 
multiplying with e x we get 

f n(i - 0,)" II (0. - 0X S(, n Mi 

(11) = const. 12(ft + 2) £ w 2 "+y* (1 -“> du jf u" +1 e xa - u) du 

+ 2(2 n + 3) f 1 u 2n+i e 2 * (1_u) du - 4(n + 2) t u in+3 e lxa ~ u) du 
Jo Jo 

+ j[ l « n+2 e l(1_u) duj. 

Thus we have 

P,{d x + + 03 < Z) = const. {Fi(Z) + Pt(Z) + Fa(Z) + F t (Z) j, 

where Fi(Z), Ft(Z), F 3 (Z) and F*(Z) are the contributions to the cumulative 
distribution by the four terms of the right-hand side of the following equation 


E(e xX$ ') = const. U(n + 2) jf (1 - «) 2n+3 e 2l “ du jf (1 - «) n+ V u du 
+ 2(2 n + 3) (1 - u) 2n+ V IU du - 4 (n + 2) jf' (1 - u) 2n+t e 2xv 

+ f (1 - nr + Y u du}, 


where const. = [(n + 2)(n + 3)(2n + 5)]. Proceeding according to the method 
given in (a) we have 

(12) F<(Z) = [1 - (1 - Z) n+B ]/(n + 3) (0 < Z < 1), 



436 

D. N. NANDA 


(13) 

(14) 

F,(Z) = 2(2n + 3)[1 - (1 - Z/2) 2 " +6 ]/(2n + 5) 
F 3 (Z) = -4(» + 2)[1 - (1 - Z/2f n+t \/{2n + 4) 

(0 < Z < 2), 
(0 < Z < 2). 


Let us now consider Fi(Z), which is the contribution of the first term. Let 
yi and y, be distributed between 0 and 1 with densities (1 — yi) in+% and (l — y t ) nH 

y 2 



Fia. 1 


respectively, then 

W) - 2(n + 2) // (1 - Vl f n+ \l - yd nH dy\ dy ,, 

where Z goes from 0 to 3. 

Let us consider the distribution over the unit square OABC , Fig. 1, then for 
Z < 1, Z < 2, and Z < 3; we have to integrate over OLM, OCNP , and OCQRA, 
where LM, NP and QR are the three lines given by 2 yi + y% < Z according as 
Z < 1, Z < 2, and Z < 3. 

(i) The integration over OLM is given below 

KM = 2(2n + 2) fj (1 - Vi y n+ \l - ? / 2 )" +1 dyx dy, for Z < 1, 

2|/l-H/sSZ 

or 

* 2 { 2 vt4 [1 - (1 - z/2)!,t ‘ 1 

( n y\ 3rt+6 

— 2 —y [T 2 / 0 —z) (2rt + 4, n + 3) 


— Iv-z)nz-z)(2n H- 4, n + 3)] 



SUM OF ROOTS 


437 


where 

X = B(2n + 4, n + 3) - f y tn+3 ( 1 - y) n+2 dy 

Jo 

and 

f 2/(3-Z> 

X/a/(j-z) = / J/ 2n+3 (l — y ) n+3 dy. 

Jo 

(ii) The integration over OCNP is given below. 

(16) F 1A {Z) = [1 - (1 - Z/2 ) 2n+ *]/(n + 2)(2n + 4) - 2 n+2 [(3 - Z)/2f +fl) 

(5(2n + 4, n + 3) — + 4, n 4- 3)}/(n + 2) (Z < 2). 

(iii) In order to integrate over OCQRA, we shall integrate over the unit area 
OCBA and subtract from this the value obtained by integrating over QRB. 
Thus, 

(17) = l/(n + 2)(2» + 4) - 2 n+J [(3 - Z)/2f n ™ 

B(2n 4“ 4, n + 3)/(n + 2). 

Hence the result for l = 3 can be expressed as 

f\(0i + 02 + di < Z) — const. (Fi,i(Z) + Fz(Z) + F 3 (Z) + F 4 (Z)) 

= const. {2(n + 2){[1 - (1 - Z/2) in **]/(n + 2)(2» + 4) 

— X i 2" +2 [ (3_zl /2] 3 " +6 [72/(3-z)(2n, + 4,n +3) 

— i’(2-z)/(3-z)(2u + 4, n + 3)]/(n + 2)} 
+ 2(2n + 3)11 - (1 - Z/2) 2n+6 ]/(2n + 5) - 2[1 - (1 - Z/2) 2n+< l 

+ [1 - (1 - Z)" +3 I/(n + 3)) (0 < Z < 1), 

and 

= const. {Fi,i(Z) + F 2 (Z) + F 3 (Z) + F t ( 1)J 
= const. |^2(n + 2) [1 - (1 - Z/2) 2n+i ]/{n + 2)(2n + 4) 

_ 2 n+2 ^^-?y n+6 [S(2n+ 4,n+ 3)- XZ( 2 _ z) /(a_ Z )(2n + 4, n+ 3)]/(n + 2)J 

+ 2(2n + 3)[1 - (1 - Z/2) in+i ]/(2n + 5) - 2[1 - (1 - Z/2) 2n+i ] + l/n+ 3 ] 


(1 < Z < 2), 



438 


D. N. NAN DA 


= const. {Fi,z(Z) + ft(:2) + F,(2) + F 4 (l)| 


= const. < 2(n + 2) < l/(n 4- 2)(2n + 4) - 2' 


<3 -2 


B(2n + 4,n + 3)/(n + 2)j + 2(2n + 3)/(2n + 5) - 2 + l/(n + 3)| 

(2 £ Z < 3), 

where const. = (n + 2)(t 7 + 3)(2n + 5) and \ = B(2n 4- 4, n + 3), 

The exact distribution is obtained for l = 4 by the similar method. The final 
results are available with the author and are not given here due to lack of space. 

The method given in the above sections can be used to find the distribution 
of the sum of roots of a determinantal equation of any order under the condition 
m = 0 . 

6. Moments of the distributions. The moments can be obtained by expanding 
the right-hand side of (6) in terms of x and then collecting the coefficients of x. 
The moments for l = 2 have been derived here and the method is illustrated 
below: 

(a) l = 2. Equation (9) gives 

f U(1 - 6 { ) n ( 6 1 - fl 2 )e l2 "n dJ, = const. ( 2 f (1 - u) in+ * e 2l “ du 


— jf 1 (1 — w)" +1 e*“ dttj = const. ^2 j[' (1 - u) 2n+2 £ 

- / (1 ~ tr)" +1 EMU const. {2 £ 

•'O <”0 J ^ <-o £! r(2n -f- £ “r 4) 


_ y r« + i)r(w + 2) \ = 

1=o t! T(n + t + 3) J 


2 

2 n + 3 


271+4 


+ ( 2 a :) 2 _( 2 ^_ , 

(2n + 4)(2n + 5) (2 n + 4)(2n + 5) (2a +6) 


1 

(a + 2) _ 


1 + 4. x 

n + 3 ' (n + 3) (n + 4) 


4- z _l 

(ti + 3) (71 + 4) (n + 6) 


E(e xX9< ) = ( 1 -l i. 3 b* 12(n + 2) (47i + 11) 

\ 11 ' (n + 3) ^ 21 (t7 + 3 )(t7 + 4)(2 t7 + 4)(2n+ 5) 

x 8 120(n + 2)(n + 3) (4 ti + 13) . 

3! (277 + 4)(2n + 5)(2n + 0)(» + 3) (77 + 4)(n + 5) ' 



SUM OF BOOTS 


439 


Hence 

Mi = 3/(» + 3), 

M2 = 6(4n + ll)/(n + 3 )(n + 4)(2n + 5 ) 

and 

MB = 30 (4n. + 13)/(n. + 3)(» + 4)(» + 5)(2» + 5). 

The moments for l — 3 and 4 can be obtained in a similar way. 

Acknowledgements. The problem was suggested to me by Dr. P. L. Hsu. 
I take this opportunity to express my gratitude to Dr. P. L. Hsu for guiding me 
in this research. I am also indebted to Dr. Harold Hotelling for help and sug¬ 
gestions in the work. 


REFERENCES 

[1] D. N Nanda, “Distribution of a root of a determinantal equation,” Annals of Math 

Slat., Vol. 19 (1948), pp 47-57. 

[2] D. N. Nanda, “Limiting distribution of a root of a determinantal equation,” Annals 

of Math. Stat., Vol. 19 (1948), pp. 340-350 



KOTES 

This section is devoted to brief research and expository articles and other short items. 


A NOTE ON THE POWER OF A N ON-PARAMETRIC TEST 

By F. J. Massey, Jr. 

University of Oregon 

1. Introduction. Let a* < x% < • • ■ < x n be tlie ordered results of n inde¬ 
pendent observations of a random variable X which has a continuous cumulative 
distribution function F(x). The following test for the hypothesis that Fix) has 
some specified form, say Fa (x ), has been suggested by Wolfowitz [l]. 

Form the cumulative distribution of the sample and obtain the maximum 
deviation of this from F a (x) Tims if 



S n (a) 

= 0 

when 

X < X! , 



_ 

n 

when 

X)c ^ X 



= 1 

when 

Xn Xj 

the test statistic used would be 





d — max 

1 Fo(x) 

- S n {x) \ \/n, 




X 


and the hypothesis would be rejected if d is large, say larger than d a which is so 
chosen that the probability of a type I error is a■ The limiting distribution 
(as n oo) of d has been tabled [2], and a short table of the distribution of d 
for various small values of n (n < 80) has been given [3]. 

The purpose of this note is as follows: 1. A lower bound for the power of the 
test is given. 2. This test is shown to be consistent against any continuous alterna¬ 
tive F(x) = Fi{x), where Fi(x) ^ F a (x). 3. The test is shown to be biased for 
finite n. 4. An indication of similar results for a two sample test. 

2. Lower bound for the power function. Let A = max | F 0 (x) — Fi(x) | and 

let Xo be a value of x such that A = j F 0 (*o) — Fi(*o) |. The probability that 
d > d a is certainly not less than Pr j Vn| F 0 (x 0 ) - S n (x<>) | > d«). This is the 
Bame as 

1 - PrjFofro) - ~J^< S n (x 0 ) < Fo(xo) + , 

which, since S„(x o) is the proportion of observations falling less or equal to xo, 
is given by the binomial probability law. 

If F(x) = Fi{x) the probability of an observation being less than x a is Fi(xf). 
Since F o(xo) = Fi(x 0 ) ± A the above probability can be written as follows: 

440 



ON NON-PARAMETRIC TEST 


441 


1 — PrlFj^o) ± A — d a /s/n < S„(x D ) < Fi(x 0 ) ± A + da/y/n} 

= 1 — Pr{±A — da/y/n < S n (x 0 ) — F^xo) < ± A + da/y/n] 

= 1 — Prj ( — d a ± A\/n)/VfiW(l - Fi(a;o)) < (S n (x 0 ) — Fi(x o)) Vn/ 

VFi(x 0 )(l — < (d a ± A a/ n)/V Fi(x a ) (1 — Fi(x 0 )))- 

A is fixed. It has been found [3] by observation for samples of size <80 that 
da actually decreases in size as n increases. For sufficiently large n both 

— d a ± A\/fl and d a ± A y/n 

have the same sign and the law of large numbers indicates that the above prob¬ 
ability approaches zero and the expression approaches unity. 

The last expression above can also be used as a lower bound of the power of 
the test for finite n. 

For large values of n this probability is given approximately by the normal 
distribution Thus we can write for large n; 

powt,r>l — it, 

where 

Xj = (— d a ± Ay/n)/-\/Fi(xa)(l — F 2 (x 0 )) 

and 

X 2 = (da ± A Vn)/VFi(xo) (1 - Fx(x „)). 

If n is so large that X 2 and X 2 are of the same sign and sufficiently different 
from zero we can replace Fi(x 0 ) by \ and not decrease the value of the integral. 
In this case we might use as a working formula 

Xi = 2(— d a i A\/n)j 

X 2 = 2 (d a =t AVn). 

Since 



approaches one as n tends to infinity, the power, which is larger, must also ap¬ 
proach one, and thus the test is consistent. 

To demonstrate the biasedness of the test for fixed n consider the following 
picture. 

The Fo(x) is shown as a heavy line and an alternative Fi(x) as a dash-dot line. 
Fi(x) coincides with F 0 (x) except between the point x = a and x = b. If S„(x) 
falls outside of the indicated band at any point we agree to reject the hypoth¬ 
esis F(x ) = Fo(x). If F(x) - Fi(x) the S„(x) has no chance of being outside 
the band between x = a and x = c, less chance between x = c and x = 6 than if 



442 


F. J, MASSEY, JR. 


F(x) = Fo(x), and the same chance for x larger than b. This indicates that the 
probability of rejecting F(x) = F 0 (x), if actually F(x) = Fi(x ), is greater than 
the probability of rejecting F(x ) = F 0 (x) if this is actually true. Thus the test 
is biased. 

3. Two sample test. Let S„(x) and >sL(x) be the cumulative distributions ob¬ 
served for samples of sizes n and m from two populations having continuous 
cumulative distribution functions F(x) and F'(x) respectively. Under the as¬ 
sumption that F(x) = F'{x) the limiting distribution (as n and m tend to in- 



Fio. l. 


finity) of d! — (n _1 + max x ( 5 n (x) — S' m (x) j has been found and tabled 

[4], but the distribution of this statistic for small to and m is not known. 

Suppose we wish to test the hypothesis that F(x) = F'(x) at level of sig¬ 
nificance a and agree to reject this if d' is larger than d' a , where d' a is the value 
which would be exceeded a proportion a of the time if the hypothesis is true. 
The values of d a are not known for small samples but are for the limiting case [4]. 

The same argument as in Section 2 gives a limiting lower bound to the power 
of the test in terms of 


A = | F(x 0 ) - o) |, 

where x 0 is the value of x which maximizes | F{x) — F'{x) |, to be 

ffc* 1 






-< 2 /2 


dt, 





OPTIMUM SELECTIONS 


443 


where 


XI - (-* i/i +1 ± a)/ y SES TM 


and 


xi = (V* ± a) fjm 1 EZM + »ZM, 

\ y n m / / y n m 

Since this lower bound approaches one as n and m approach infinity the power 
also approaches one and the test is consistent. 


REFERENCES 

[1] J Wolfwitz, “Non-parametnc statistical inference,” Proceedings of the Symposium, on 

Mathematical Statistics and Probability, University of California PresB, 1949, 
pp 93-113. 

[2] N Smirnov, “Table for estimating the goodness of fit of empirical distributions,” 

Annals of Math Stat., Vol 19 (1948), pp 279-281 

[3] E Massey, “A note on the estimation of a distribution function by confidence limits,” 

Annals of Math, Stat , Vol 21 (1950), pp. 116-120. 

[4] N Smirnov, “On the estimation of the discrepancy between empirical curves of dis¬ 

tribution for two independent samples,” Bull Math Univ. Moscou, Sbne Int., 
Vol 2, fasc 2 (1939) 


ON OPTIMUM SELECTIONS FROM MULTINORMAL POPULATIONS 1 

By Z. W. Birnbaum and D. G. Chapman 2 
University of Washington 

1. Introduction. Let Fi, F,, • , F„ be scores in n admission tests such as 

those used in educational institutions, personnel selection, or testing of mate¬ 
rials, and let these scores be used as a basis for selecting a sub-population II* 
from an initial population II. This selection is usually performed in such a 
manner that an achievement or performance score X has a distribution in II*, 
which shows some required improvement over the distribution of X m U; such 
an improvement may for example consist in changing the expectation E{X) of 
X in n to a pre-assigned value E*(X) in n*. Among all selection procedures 
based on Fi,••• , F„ and achieving the required improvement of the distribu¬ 
tion of X, it appears desirable to find those which retain as large a portion of n as 
possible It will be shown that under certain assumptions the linear truncations 
studied in an earlier paper [1] are such optimal selections. 

2. Selection, truncation, linear truncation. Let the frequency of individuals 

with the scores ( X , Fi , • , F n ) be F(X, Fi, • • ■ , Y n ) in II and 

1 Presented at the New York meeting of the Institute of Mathematical Statistics on 
December 27, 1949 

! Research done under the sponsorship of the Office of Naval Research. 



444 


Z. W. BIRNBAUM AND D. G. CHAPMAN 


F*(X, Fi, • ■ • , F„) 


in n*. Since Tl* was obtained by selection from IT, we have F*/F < 1, and since 
the selection was made solely on the basis of the values of Y \, • • ■ , Y„, the 
ratio F*/F is independent of X . We thus have 


and 


F*(X, Ft, ■ ■ • , F„) _ , 
F(X, Fi, ■ • •, F„) ^ ri ' 


, Y n ) 


(2.1) OS^Fi,... > F„)<1. 

Let N = ff • ■ ■ J F(X, Yi, • • • , F n ) dX dYi • ■ • dF n and 


W* = ff • ■ • / <?*(*, F t , • • • , F„) dXdYt • • • dF„ 


be the number of individuals in II and n*, and f(X, Y\ , • • • , F„) and /*(X, 
Fi, ■ * • , F„) the distribution densities in U and n*, respectively, so that F = 

Nf, F* = N*f* and • ■ J / dX dY x • • • dF„ - ff ■ ■ ■ f f* dX dYi ■ ■ ■ 

dY n - 1. We then have 


and 

( 2 . 2 ) 


NT = <pNf, 


N 


==//••• f<p(Yi , • • • , Y n )f(X, Yi, ,Y„) dX dYi 


dY n 


Thus any selection of a subpopulation II* from II based only on Ft, • • • , F n , 
defines a p{Y \, • • , F„) satisfying (2.1). Conversely, if the frequencies 

F{X, Fi, ■ • • , Yn) 

in II are given, any measurable <p{Yi, ••• , F„) satisfying (2.1) defines new 
frequencies F* = <pF and hence a selection from n based only on Fi, • ■ ■ , Y n . 

These considerations lead to the following definitions; 

A measurable function p(Yi , • • • , F„) which satisfies (2 1) is called a selection 
in Yi, ■ • ■ , Y n If, in particular, <p is the characteristic function of a set fl in 
(Fi, • • • , F„), that is vo = 1 in 0 and p = 0 in then the selection p will be 
called a truncation inY\, ■■ ,Y„ to the set SI., If Q, is defined by a condition of the 
form 

L%F, > t 

/-i 

with constant a, , f, then the truncation to the set il will be called a linear trunca¬ 
tion mYi, ■ - • , F„. 

In view of (2.2) we will refer to 



OPTIMUM SELECTIONS 


445 


(2.3) r(jp) = //■••/ <f»(F x , • • • , F„)/(X, F x , • ■ • ,F„) dX dYi, ■ ■ • 
as the fraction retained in the selection ip. 


3. A lemma. We will need the following slight generalization of the funda¬ 
mental lemma - of Neyman-Pearson (cf. [2]). 

Lemma Let G(Yi, • • ■ , F n ), €h(Y t , ■ • • , F„), - • ■ , O m (Yi , • • - , Y n ) begiven 

integrable functions and Ci, ■ • • , c m given constants, and let (<t>) be the family of 

all measurable functions <p(F x , ■ • • , Y n ) which satisfy the conditions 

(3 1) 0 < *(Fi, • • , F„) < 1 

(3.2) f •** f <p(Yi , ■ • • , Y n )G,(Yt , * - ■ , F„) dY 1 ■ • • dY n = c, 

for i — 1, ■ • ■ , m. 

If there exist constants ki , ■ • • , swell that the characteristic function 

Vo(Xi, ■ • - » F„) of the set E [G > £ 

(ri, • ,r n ) L *-i 

(3.3) f + ■■■ f + <poG dYi • ■ • dY n > f + ■■■ f + v GdY x ■ ■ dY n 

*'—00 •'—oo *'—00 *'—00 


c,(?, = 


E belongs to (<t>), then 


for any ip in (<£). 

Proof: We have co 0 = 1 > <p in F and v>o = 0 < #> in 2J, hence 





*<*Fi <JF„, 


and (3.3) follows since po and <p fulfill (3.2). 


4. Selection from a multivariate normal population, for which the fraction 
retained is maximum. From now on we assume that the conditional distribution 
of X for given F x , F 2 , ■ • ■ , F„ is normal with a mean which is a linear function 
of the F’s and with a variance which is independent of them, i.e., 

(4.1) fiX | Fj, F 2 , • • •, F„) = 77=; exp 

Let Q(Yi, ■ ■ , F n ) denote the marginal density of F x , ■ • , F„ . 

Theorem 1. A selection such that 

1° m II* a 'proportion at most equal to a given proper fraction t has values of X 
below Xa ,i e. the t-quantile in II* is greater than or equal to Xo, when Xo is a 
given number greater than the t-quanhle in II, 

2 ° the fraction retained is maximum, 
is a linear truncation. 


2c 2 



446 


Z. W. BIRNBAUM AND D. Q. CHAPMAN 


Proof: We have to maximize 


(4.2) 

'M-I- 

• /*>(F t , • • ■ 

, F„)Q(Fi , • ■ • 

, F„) dYt ■ • • 

dY n 

under the 

condition 





GO **“O0 

•• rfYi,- 
^00 

• ■, F„)Q(Fi, 

■ • Yn)f(X 1 F), 

••• ,Y n )dYi 

■ ■ ■ dYJX 

‘'-'OQ •'—6Q 

• ■ AYi, ■ 

""*00 

■ ■,F n )Q(Fi, • 

’ ) Y n)f (X | Fj , 

••• ,Y„)dY 1 ■ 

• • dY n dX 


Substituting the expression (4.1) for/(X | F, , • - ■ , F„) and integrating with re¬ 
spect to X we may rewrite this in the form 


U v ) = [ 


-t-w »+<o 


(4.3) 


where 


/ ,Y n )Q(Y lt -- - ,YJ 

* (u) -^r r L e ~‘" 2dt ’ 


dYi--- dY n ^ 0, 


and we have to maximize (4 2) undor condition (4.3). 

Without loss of generality the inequality L(<p) < 0 in (4.3) may be replaced 
by equality. For if we had a selection cp L which maximizes (4.2) and satisfies (4.3) 
with a strict inequality L(ipi) < 0, then could not be equal to 1 almost every¬ 
where since then we would have F* = F almost everywhere and Xa would be 
equal to the e-quantile in II, in contradiction with 1°; hence <f>i = + a(l — <pi) 

for sufficiently small a > 0 would also satisfy (4.3) with a strict inequality but 
would yield r(<pi) > r(«pi). 

To solve our problem we now have to maximize (4.2) under the condition 
(4 4) L(<p) = 0. 

Applying the lemma of Section 3, with m = 1, and 


G{Yi, ■ ■ • , Y„) = Q(7i, , F„), 

(x 0 - E p, y) 


Gx(Yi,"-,Y n ) = Q(Fi, ,F„) 


we conclude that the selection satisfying 1° and 2° will be the characteristic 
function <pa(Yi, • • • , F„) of the set defined by 


(4.5) 




< 1 , 


provided k can be determined so that <pt> satisfies (4.4). 



DISTRIBUTION OF DISTANCE 


447 


To find such a k we consider 


/(()-/_ 

J Q(Yi, • 

• •, F n ) 

(x Q - ± p ,y\ 

- f -]“« 

,2 PiY x ^.t 



L \ <? / J 


As f tends to — , /(f) tends to L(l), where L was defined by (4.3). Since the 

e-quantile in II was less than X 0 it follows that /(—<») = L(l) > 0. Since 
1 (f) <0 for large f, there exists f 0 such that I (fo) = 0, and clearly, 

Setting in (4.5) k = [vK(Xo — fo)/<r) — e]" 1 , one obtains a <p 0 such that 

L(<po) = /(fo) = 0. 

The selection ipo is the linear truncation to the set p t F, > fo. 

By a similar and somewhat simpler argument one proves the following the¬ 
orem. 

Theorem 2 A selection such that 

1° in II* the mean of X has a value greater than or equal to a pre-assigned num¬ 
ber m > 0, 

2 ° the fraction retamed is maximum, 
is a linear truncation to a set X)”-i p,F, > fo. 

An immediate consequence of Theorems 1 and 2 is that a linear truncation, 
using a properly determined weighted score XlC-i p,F, and cutting score f 0 , is 
more economical than any truncation to a set F, > f, , i = 1,2, • ■ • , n, that is 
than any truncation performed on each admission score separately. 

REFERENCES 

[1] Z W. Birnbatjm, “Effect of linear truncation on a multinormal population,” Annals of 

Math Stat , Vol. 21 (1950), pp, 272-279. 

[2] J Neyman and E S Pearson, “Contributions to the theory of testing statistical 

hypotheses,” Stat. Res Memoirs, Vol I (1936), pp. 1-37, particularly pp 10-11, 


THE DISTRIBUTION OF DISTANCE IN A HYPERSPHERE 

By J. M Hammersley 
University of Oxford 

1. Summary. Deltheil ([1], pp. 114-120) has considered the distribution of 
distance m an n-dimensional hypersphere. In this paper I put his results (17) 
in a more compact form (16); and I investigate in greater detail the asymptotic 
form of the distribution for large n, for which the rather surprising result emerges 
that this distance is almost always nearly equal to the distance between the 



448 


J. M. HAMM EHSLEY 


extremities of two orthogonal radii. I came to study this distribution by the 
need to compute a doubly-threefold integral, which measures the damage caused 
to plants by the presence of radioactive tracers in their fertilizers; for the dis¬ 
tribution affords a method of evaluating numerically certain multiple integrals. 
I hope to describe elsewhere this application of the theory. 


2. Derivation of the frequency function. Let Pi and T 2 be vector spaces of n 
and 2« dimensions respectively. Let P and Q be any pair of points in Ti . Denote 
by (PQ) the point in 1\ , whose first n coordinates are the coordinates of P 
in Ti and whose last n coordinates aic the coordinates of Q in 7\ . Let (P) and 
(Q) be point sets in Pi,and let [PQ\ be the point set in Ti such that (PQ) «(PQ) 
if and only if both P e (P) and Q e {<2}. Let Mi\P) denote the n-dimensional 
measure of the point set (P) in Ti , and let M?\PQ} denote the 2n-dimensional 
measure of the point set \PQ } in I\ . Then 


( 1 ) MiiPQ] = Mi{Q\ dMi[P]. 

' [r| 

Let R be. a fixed point in Pi , and let S„(a) bo the n-dimensional hypersphere 
in Ti with centre R and radius a. Let A and B be any two points chosen at 
random in *S tl (<i), the distributions of A and B being independent and uniform 
over the interior of S n (a). Denote the distance AB by r, and let X = r/2a, 
so that X may take any value in the interval 0 < X < 1. We require the fre¬ 
quency function of X, which we shall denote by /„(X). 

The volume content of S n (a) is 

( 2 ) 7,(o) = *"V/r(*n + 1 ); 

and the content of the segment of the surface of S n (a) bounded by a right hyper- 
spherical cone, whose vertex is at R and whose line generators make a fixed 
semi-vertical angle 6 with a fixed radius of S n (a), is 

n (n-l)/2 n-l .0 

(3) U n (a, 8 ) = - Jw —yv sin" - * <f> d<f>. 

- 2 ) 

As a particular case of ( 2 ), the whole surface of S n (a) has content 

(4) U n {a, 7 t) = 2T n V-7r(in). 

Let {AB} be the point set in P 2 such that (AB) e {AB} if and only if the cor¬ 
responding points A and B satisfy all the inequalities 

(5) 0 < RA < a, 0 <RB<a, r < AB < r + dr. 


Then, by the definition of /„(X), 

Mi{AB] oc/„(r/2o) dr-, 

but since 


p2 a * 2a 

/ Mi{AB } dr = V 2 n> / f n (r/2a) dr/2a = 1, 

Jo Jo 



DISTRIBUTION OP DISTANCE 


449 


we have 

(6) M,{AB) = Vlf n (r/2a) dr/2a s p B (r, a) dr, say. 

Consider also the point set [CD] in T 2 such that (CD) e [CD] if and only if 
the corresponding points C and D satisfy all the inequalities 

(7) 0 < RC < a + da, a < RD < a + da, r < CD < r + dr. 

For each fixed D of [D \, C is constrained to lie on the segment of the hyper- 
spherical shell of thickness dr, radius r, and centre D, bounded by the inter¬ 
section of this shell with S n (a + da) The hypersphencal cone, with vertex D, 
whose line generators all pass through this intersection, has a semi-vertical 
angle 6 given by 

(8) cos 6 = r/2a = X, 

and so, from (3), the M\ of all C which satisfy (7) for each fixed D is U n (r, arc- 
cos X) dr On the other hand the Mi of all D which satisfy (7) is the content of 
the hypersphencal shell of thickness da, radius a, and centre R, and is thus 
U n (a, x) da by virtue of (4) Consequently, from (1) 

(9) Mi { CD ) = U n (r, arccos X) U n (a, ir) da dr. 

On the other hand, by symmetry, M 2 {CD ) = \M 2 {EF}, where ( EF) e {EF} 
if and only if the corresponding points E and F satisfy either all the inequalities 

0 < RE < a + da, a < RF < a + da, r < EF < r + dr, 

or all the inequalities 

0 < RF < a + da, a < RE < a + da, r < EF < r + dr. 

We can express this in another way by saying that (EF) t {EF} if and only if 

the corresponding points E and F satisfy all the inequalities 

0 < RE < a + da, 0 < RF < a + da, r < EF < r + dr, 


but do not satisfy all the inequalities 

0 < RE < a, 0 < RF < a, r < EF < r + dr. 
From this second pomt of view we see that 

d 

M' t {EF) = p n (r, a + da) dr - p n (r, a) dr = — p n (r, a) dr da; 

and so 

Q 

(10) M 2 { CD } = § — p n (r, a) dr da. 


Then from (2), (3), (4), (6), (9), and (10). 


(ID 


13/ irV" 

2 da \[r(in + l)] 2 



y^/y— 1 

. r(^n - i) 


l 


aroooiX 


sin 


n-S 



2ir n/z a n-1 \ 

. r (in) f 



450 


J. M. HAMMERSLEY 


By performing the partial differentiation on the left-hand side, then substituting 
z = cos cf> and r = 2 ok, and using the relations 


T(in + 1) = 7r 1/2 r(n + 1) = 2 n r(fn + *)r(*n + 1), 

B(in + h in + }) = (r(*n + *)} 5 /r (n + 1), 


we reduce (11) to the form 

(12) < 2 » -1) f M - x/:w - £ (1 - ‘ V '" K ^ 

We multiply (12) by —X’" 2 ' 1 and use the reduction formula 

(13) (w - 1) f (1 - *T~' )n dz = w £ (1 - j, 5 )<»—i)/j & + X(1 _ x y«-n/ 2 i 


Each side of the resulting equation is a perfect differential coefficient, and upon 
integration we obtain 

2 ftX n_1 


(14) 


/,00 = 


f (1 - * 


(rt—O/a 


dz + cX 


2n-l 


BQn + h + J) A 

where C is the constant of integration. We obtain the cumulative distribution 
function by integrating (14) over 0 to X, 

(15) F„(X) = (2X)"7i_x» (£n + h i) 4* ift + ^) 4- CX in /2n, 

where h(p, q ) is the incomplete beta-function ratio 

Ix(p, q) = [ 3 P_1 (1 - z) ?_1 cfe/B(p, (?) 

*/o 


tabulated by Pearson [2], Putting X = 1 in (15) we got 

1 = F„{ 1) = 1 + C/2n; 
so C = 0, and we have the final result 
(16) /„(X) = 2"ftX n ~ 1 I 1 _xz(fft 4- i i). 


This compact form may be compared with DeltheiPs expression [1] for the fre¬ 
quency function of r, namely 



where 


5 

h n (2 sin 4) = I 
Jo 


sm n 2 0 


/r 


sin" 2 <£ 


expressions which he evaluates only for the particular cases n = 3, 5, 7, 9. 
Interesting particular cases of (16) are 



DISTRIBUTION OR DISTANCE 


451 


/i(\) - 2(1 — X), / 2 (\) = — Xjarccos X — X(1 — X 2 ) 1 ' 2 }, 

(18) r 

f»0) = 12X 2 (1 - X) 2 (2 + X), 


which give the appropriate frequency functions for a line, a circle, and a sphere 
respectively. 

3. Recurrence relations and moments of the distribution. From (13) and (14) 
we have a recurrence relation for penadjacent values of n, 


(19) 


/„00 


= 4X 


2 /«—’(X) 


2r(n) 


n — 2 + i) j 


X”(l - x 2 )<"- 1)/2 . 


In connection with (18) this shows that 

(20) /271+lM = P 4n+l(X) , /2n(X) = P 2n— l(X) arCCOS X -f- Pin- s(X)(l — X 2 ) 1/2 , 


where Pn(\) denotes an unspecified polynomial in X of degree N or less. 
From (16) the rth moment of /„(X) about X = 0 is 


( 21 ) 


Mri? 


nT{n + 1) \ / r(jw + jr + |) \ 

r(^ + i)/\(n + r) r(n + hr + 1)J 


I have not been able to obtain the characteristic function of /„(X) explicitly 
from (21) it appears to be of a higher type than the hypergeometnc function. 

4. The asymptotic form of the distribution for large n. The distribution func¬ 
tion is, by (15), 


(22) F n (X) = (2\) n h-\a(^n + §) + hi(%n + + §). 

We show firstly that as n —> °° the first term of this expression tends to zero. 
This term is clearly zero if A = 0. If X > 0 

^(n—0/2(1 _ z) -i/* dz < y* 9 WI* dz= (1- X 2 ) (n+1) 7’*(» + 1) X. 


Hence 


•(2xr/i_x.an + h h) < 


(2xrr(4w + i) # (1 - x 2 ) (n+1)/2 

ir 1/2 T(^n + £) . + 4) X 


< 2TOw + 1) 

“ iWCfci + i) 


(1 - X 2 ) {4X 2 (1 - A 2 )}'™ < 


(n-ll/8 ✓ 2r(|n + 1) 


+ |) 


as n —► =0 . Secondly, as n —> « 

Zx>(^n + hn + £) Nvib 1/4- 1)) ~ 1/4 n), 

(see Cramdr [3] p. 252 with p = q = h), where N x (jj., <t) is the normal cumula¬ 
tive distribution function of a; for mean n and variance c 2 . Hence X is asymptoti¬ 
cally distributed as N\{1/y/2, l/8n); and the asymptotic distribution of r is 
N r (a\/2, a/2n). This establishes the result stated in the summary. 



452 


It. K. ZEIGLER 


It can also be proved, by considering the limiting form of the recurrence rela¬ 
tion (19), that the frequency function /„ is asymptotically normal. The mai n 
difficulty of proving this fact lies in showing that the frequency function actually 
possesses a limiting form; and the proof is rather too long to be given here. 

REFERENCES 

[1] R. Dei.tiieil, Probabilitbs Gbombtriqucs. Traitb du Calcul des Probabilitbs et de see 

Applications: Tome II, Fasciscule II, Gauthier-Villars, 1926. 

[2] K. Pearson, Tables of the Incomplete Beta-Function, Cambridge University Press, 1934 . 

[3] H. Cramer, Mathematical Methods of Statistics, Princeton University Press, 1946. 


A NOTE ON THE ASYMPTOTIC SIMULTANEOUS DISTRIBUTION OF 
THE SAMPLE MEDIAN AND THE MEAN DEVIATION FROM 
THE SAMPLE MEDIAN 

By R. K. Zeigler 
Bradley University 

Consider a random sample of 21c + 1 values from a one-dimensional distribu¬ 
tion of the continuous typo with cumulative distribution function (cdf) F{x) 
and probability density function (pdf) /( x) = F'(x). Let the mean, standard 
deviation and median of the distribution be denoted by m, a and 0 respectively 
{0 assumed to be unique). We shall suppose that in some neighborhood of 
x — 6, f(x) has a continuous derivative f'{x). 

If we arrange the sample values in ascending order of magnitude: 

Xi < Xi < ■ ■ • < Xj*+i, 

there is a unique sample median x k+ i which we shall denote by {. The mean 
deviation from the sample median is then defined by 

i 2*+l 

M ~ JS S 1 *'- 11 ' 

In the material that follows we shall assume that the sample items have been 
ordered only to the extent that k of them are less than £ and 1c of them are greater 
than f. 

We then have the following 

Theorem. Letf{x) be a pdf with finite second moment, continuous at x = 6 with 
f{8) 0. Then the simultaneous distribution of £ and M is asymptotically normal, 

The means of the limiting distribution are 6, the population median, and u', the 
mean deviation from the population median, while the asymptotic variances are 
l/4/ 2 (0)2fc and ((m — 8'f + a — u^)/2k. The asym ptotic expression for the 
correlation coefficient is (m — 9)/\/(m — 8)* -{-a 1 — u" 1 . 

Proof: Let u = (M — u')y/2k andu = (£ — 0)V2fc, where u' = E | x — 6 [. 
Then the simultaneous characteristic function of the two random variables u 



AN ASYMPTOTIC DISTRIBUTION 


453 


and v is given by the following: 

u')Vn+«la(t-#)V2fcj 

. E exp [«, (i L I«. - «I ~ »') VS + « - »■VS] 

d-a-f 


(2k + 1)! 
(kiy 



- | 

' 2H-1 * 

exp 

& 1 

52 

,£jb+a i-l 

[ 2* 


— u'j y/2k + A(£ — 0)V2*J 


f(xi)f(xi) ■ • • f(x k )f(x k+i ) ■ ■ • /fek+i)/(i) 
ctat+i • • ■ dxtt+'dxh • • • dxidi 


(2k + 1)' f f 
(kl) 2 IK 


, A 
exp ^ _ V2k 


(x + u')\ fo) dx \ 

Upw MU P 6 the .»*» I - • + »/VS. «■ “ te 

reduced to the following form: 

, _ m + r IT f exp [--% + uoI m dx 

(1) v'2/t(fc0 3 i,U-« L -* 


/ 


r a 




exp 


^ + ^ 


exp 


_V23b 


(* 


- «') /(it) 


dx 


- I 6XP 




Now 


•«"»/ («+ *■ 

X^ow , 

_ J_ [' C. + »ovw 4* + ^ 

2 ( 2 *) 



464 


R. K, ZEIQXjER 


and 

/; 


exp 


Hi 

LV2 k 


%t\ r 

(x - u r ) f{x) dx = \ (x - u')f(x) dx 

~ml + 


fs(2 k, k) 


2k 


where for every fixed k , {i(2Jc, k) and fo(2k, k) —> 0 as k —* <». Similarly, under 
the substitution a; = (z/\/2k) + 0 , 


l 


s+u/iVw 


exp 


ik 

V2k 


(x - ii')] fix) dx - j[ f (^r + ») 


+ l il (V2k + e ~ u ')f (y/a + ') dz + 


VaE + 7 dz 

fa(2/c, ti) 


21c 


and 


pfl-KWv " V/* _ 1 fV ( Z \ 

/, “ p L ( * + “' ) J /W * " vl 1 ; (vl + e ) 


dz 


~fkl (v2k + 9 + u ')f (y/M + °) dz + ^r 

where fa( 2 fc, k) and ^4 (27c, k) —* 0 as k —>■ 00 for each fixed k ■ Substituting these 
B expressions in ( 1 ) and performing the indicated multiplications we find after 
some calculation that ( 1 ) can be reduced to the following form: 


<j>(k, k) = J 


(2k + 1)1 
V2ic(kl) 2 2 lk 

“4 ivf 


tlio- 1 — u' 2 ) — 4 ik(m — 0)yf 


*1 


1 - 


(V2k 


+ $ ) 


2 k 


+ 


( 77 = + 0 )}’ + m, «T / y \ 

-— )± -j .-/(, + ^y^ 


2k 


where 0 < z% < y and f(2 k, fi) —> 0 for every fixed k as k —*■ = 0 . Now taking the 
limit as fc —► 00 , we have 


lim 4>(k , th) = f * exp — - (<7 a - u' 1 ] 

*-»« \ir [_ l 


) 


+ - SL^id*. + t'fav /(«) dy , 


4iii(w - 0)/(0)y 4[/ 2 (0)]y s 

2 2 


Upon performing the integration, 


lim 4 >(ti, ij) = exp 


, 2kU(m — 6) , tl 

" * rtW/i\ 


2 /( 0 ) 


<1 V 
4/*(0)/J 



on craig’s theorem 


455 


Since a > u , this is the characteristic function for two variables which are 
normally distributed Thus, the simultaneous distribution of £ and M is asymp¬ 
totically normal. It is of interest to note that, if the pdf f(x) is symmetric, the 
correlation coefficient is zero, and M and £ are asymptotically independent. We 
might also note that 4>(ti, 0) is the characteristic function for the mean deviation 
from the sample median. Thus, the random variable M is asymptotically normal 
with asymptotic mean and variance u' and ((m - 6'f + - u ,2 )/2k respec¬ 

tively. 

The author wishes to express his appreciation to Professor A. T. Craig for 
valuable suggestions in the study of this problem. 

REFERENCES 

[1] Ft Cram&r, Mathematical Methods of Statistics, Princeton University Press, 1946 

[2] R. K Zeigler, “On the mean deviation from the median,” unpublished thesis, State 

University of Iowa 


NOTE ON THE EXTENSION OF CRAIG’S THEOREM TO NON-CENTRAL 

VARIATES 

By Osmer Carpenter 

Carbide and Carbon Chemical Corporation, Oak Ridge 

A theorem due to A. T. Craig [1] and H Hotelling [3] concerning the distribu¬ 
tion of real quadratic forms in normal variates is extended to the case of non¬ 
central normal variates with equal variance. 

The following notation is used: A, Ai , A 2 are real symmetric matrices, L is an 
orthogonal matrix, T is a diagonal matrix of latent roots, and X, Y, M and U 
are column vectors. 

Theorem. Let X’ = (xi , • • • , x n ) be a set of normally and independently dis¬ 
tributed variates with equal variance a and means M 1 = (mi, ■ ■ , m„) . Then, 
a necessary and sufficient condition that a real symmetric quadratic form 
Q(X) = X'AX of rank r be distributed as ax, where 

p&’.r.X 1 ) - 

( 1 ) ” 

T, (XV/2)7iir[(r - 2j)/2], 

7-0 

is that A 2 = A. If Q(X)/e is distributed by p(x, r, X 2 ), then X 2 = Q(M)/2a 2 . 

Further, let Qi(X) = X'A{X andQ 2 (X ) = X'A 2 X be real symmetric quadratic 
forms of ranks ri and r 2 . Then a necessary and sufficient condition that Qi(X) 
and Qi(X) be statistically independent is that A\A 2 = 0. 

Proof. The theorem is proved by establishing the equivalence and factoriza¬ 
tion of moment generating functions [4]. The moment generating function of 



456 


OSMER CARPENTER 


v(x\ r, X 2 ) is 

(2) G(t) = Ee ,xI ' 2 = e x5 ' /tl_0 (l - t)' rn 

Let £ 1 , • ■ ■ , x„ be normally and independently distributed with means 
= m, and common variance a 2 . Without loss of generality, we may take 
tr 2 = 1 , changing to the general case when necessary with the transformation 

X , = 2,/ff. 

Let Q(X) = X'AX be a real symmetric quadratic form of rank r. Then the 
moment generating function of Q{X) is 

(3) G Q (t) = Ee ,9Wn = ( 2 ir )~ n/2 f f ^ _ 


If t is restricted to values such that | t j < | I/70 |, where yo is the dominant 
latent root of A, then I — tA is positive definite and 

G Q (t) = (2xr n/ V 4f ' M(/ - u, "‘" 


(4) 


L« J-M 1 


_ e i,tf'U(/-U)~‘M I J 


tA 


H 


If L is an orthogonal matrix such that 


UAL 


r = 



) 


where the 7 , are the latent roots of A, then the transformation M = LU gives 

(5) (?«(£) - 1 1 - it \~\ 

A necessary and sufficient condition that GqQ) = G{t) is that A 2 — A. If 
A 2 = A, then all of the latent roots of A are +1 or 0, and sufficiency can be 
established by substituting the appropriate value of each 7 , into equation (5), 
giving 

( 6 ) Gq{1) = e x,,/<w) (l - ty m = G{t). 

Also X 2 = Lr 7 ,m 2 /2 = HU'VU) = KM'AM) = Q{M)/ 2. 

It is apparent from the form of Gqit) that a necessary condition for 
= G(t) is that 1 1 — fA j _i = (I — t)~ rl1 . But it has been proved by Craig [1] 
that the condition A 2 = A is necessary, as well as sufficient, for this equality. 

Next, let Qi(X) = X'AiX and Q 2 (2l) — X'A 2 X be real symmetric quadratic 
forms of ranks n and r 2 . Then from (4) 

G(t 1 , t 2 ) = Ee tlQ ^ + ‘ t9tli 

= | j _ kAi _ uAi [-i 


(7) 




on craig’s theorem 


457 


k, k being restricted to values for which (7 — O-i — f 2 A 2 ) is positive definite. 
A necessary and sufficient condition that G(k , k) = G Q (k) G Q (U) is A X A , = 0 
The required equation m the moment generating functions is 

G(h,k) = , i __ ^ H 

(8) iiii 

V | j __ pi 

Assume A X A 2 = 0. Then (7 - Mi - Mj) = (/ - Mi) (7 - M») 
and 11 — Mi — M 2 | = | 7 — Mi | • j 7 — f 2 A 2 1. Also 

Mi -+- M 2 ) (7 - Mi - M 2 ) _1 = Mi (7 - Mi) -1 -f J 2 A 2 (7 - M 2 ) -1 , 

for using the identity tA{I - tA)~' = (7 - tA)~ l - J, this becomes 

(i - f 2 A 2 ) _1 (z - mo - 1 = (j - air 1 + (i - a 2 ) _1 -7 

Multiplying both sides on the left by (7 - i 2 A 2 ) and on the right by (7 - Mi), 
the identity follows. Thus the condition is sufficient. 

It is apparent from the form of the moment generating functions that a 
necessary condition for G(k, k) = GQ{k)G Q (k) is that | 7 - Mi - kA 2 1 = 

| 7 — Mi I | 7 - a 2 1 . However, it has been proved by Hotelling [3] and 
Craig [2] that the condition A : A 2 = 0 is necessary for this equality. 

An extension can be made to correlated variates. Let X' = (xi, • ■ , x n ) 
be normally distributed with non-singular correlation matrix B and means 
M' = (m i, ■ • ■ , m„). Then there exists a non-singular transformation X - TZ, 
such that the variates Z are independent and have unit variance. Thus 
r'BT'- 1 = 7, B = TT and Q(X) = X'AX = Z'T'ATZ. Applying the theorem 
proved above, a necessary and sufficient condition that Q(X) be distributed as 
X 2 is that ( T'ATf = T'ABAT = T'AT, or that AJ3A = A. As before, 
X 2 = Q(M)/2, In the same manner, a necessary and sufficient condition for 
independence of Qi(X) and Q 2 (X) is that (T'AiT)(T'A 2 T) = rLliBAoiT = 0, 
or that A 1 BA 2 = 0. 

REFERENCES 

[1] Allen T Craig, “A note on the independence of certain quadratic forms,” Annals of 

Malh Slat, Vol 14 (1943), page 195 

[2] Allen T. Craig, "Bilinear forms in normally correlated variables,” Annals of Math. 

Stat, Vol 18 (1947), page 565 

[3] H. Hotelling, “A note on a matric theorem of A T Craig," Annals of Math. Stat , 

Vol. 15 (1944), page 427. 

[4] S. S Wilks, Mathematical Statistics, Princeton University Press, 1943 



458 


HERMANN VON SCHELLING 


A SECOND FORMULA FOR THE PARTIAL SUM OF HYPERGEOMETRIC 
SERIES HAVING UNITY AS THE FOURTH ARGUMENT 


By Hermann von Schelling 
Naval Medical Research Laboratory, New London, Connecticut 1 


A convergent hypergeometric series with 1 as fourth argument has been 
expressed by Gauss, using gamma functions, as follows: 


( 1 ) 


F(a, ft 7 ; 1) = 1 + 


a-P 

7-1 


«(« + 1) P(P + 1) 
7(y + 1) 1 1 2 




r(y)r(y - « - |8) 
r(y - a)r(y - ( 3 )‘ 


Let us denote the vtb partial sum of F[a, ft, 7 ; 1) by F,{a, ft 7 , 1 ), and let us put 


( 2 ) 


F y (a, ft 7 ; 1) 
F(a, ft 7 ; 1 ) 


G P (a, ft 7 )- 


The following equation is obvious' 

(3) G„(a, ft 7 ) = G*(ft v)< 

In [1] it is shown that 

(4) G,(a, ft 7 ) * 1 -G«(»>, 7 — P — a, 7 — a + v) 


is valid if a is a positive integer. 

If (7 — j3 — a) is a positive integer, (3) and (4) yield 

G„(a, ft 7 ) = 1 - G„(7 - p - a, v,y - a + v) 

= G-,._ 0 _ a (a:, P, a + P + v) 


In terms of partial sums of the hypergeometric series this becomes 


(5) 


r(7 - «)r(7 - P) 
r(7)r(7 - P -ci) 


F f {a, P, 7 ; 1) 


r(« + v)T(p + v) 
r(v)r(« + P + v) 


Fy-fi-a{a, p,a + p + v; 1), 


which is a new formula involving partial sums of hypergeometric series with 1 
aB fourth argument. It is more useful than (4) if 7 — p — a < a or 7 < 2a -f ft 
It is of theoretic interest that the arguments of the new series do not depend 
on the third argument 7 of the original series. Therefore it is possible to develop 
a simple recursion formula. If we write (5) for (7 — 1) instead of 7 , the series 
of the second member has one term less. Subtracting these equations yields 
after some simplifications 


1 Opinions or conclusions contained in this paper are those of the author, They arc 
not to be construed as necessarily reflecting the views or endorsement of the Navy De¬ 
partment 



HYPERGEOMETRIC SERIES 


459 


(7 - a — 1) (7 — 0 — 1 )F v (a, (i, 7 ; 1 ) 


- (7 ~ 0 - « ~ 1)(7 ~ l)P,(a, ft 7 - 1; 1) 

= r(» + a ) rO/ + /3) r( 7 ) r(i) 

r(«) ’ r(/j) ‘ r(* + 7 - 1)' fw" 

Many recursion formulas are known for hypergeometric functions, but ( 6 ) may 
be the first equation of this type linking two hypergeometric partial sums of v 
terms each. 

In order to demonstrate the numerical advantage of the new formula (5), 
we restate the example of [1]. An urn may contain N balls of which a black and 
b white A single ball is drawn. We note its color, return the ball into the urn 
and add A balls of the same color. The probability that the %th black ball 
appears at the latest in the n-th drawing is 


(7) 


W(») = 



( h N , 

V ni ’A’A +Bi: 



If - is a positive integer (5) yields 
A 


W(tt) = 


(n — n + 1 ) (n — n\ + 2 ) 


( 8 ) 


If we take 


+ n ~ ni + (^ + n - m + 2j Q + nj 

• F a/4 (ni —, — + n + 1; 1^. 
A = 1, o=l, b = N — 1, 


we get 
(9) 


W(n) = 


n!(N + n — rii — 1)1 
(n — ni) !(N + n — 1)!' 


Calculating W(n), using the original formula (7), is quite tedious, but (5) 
sometimes simplifies the numerical work. Let us calculate the probability W( 6 ) 
that the third black ball appears in the 6 th drawing, if the number of the original 
balls is N = 10, Using formulas (7), (4), and (9) respectively we have 


W( 6 ) = 


319! T 3-9 (3■ 4)(9■ 10) 

121 L + 13'1 + (13 14) (1-2) 


(3 • 4 - 5) (9 • 10 ■ 11) 
(13-14- 15)(l-2 3) 


W( 6 ) = 1 


121 91 f, 4-1 (4• 5)(1 • 2) 1 _ 4 

8113! L 14 I ' (14-15) (1 ■ 2) J 91’ 


W( 6 ) = 


6112! 
31151 


JL 

91 ‘ 


Jl 

91’ 



460 


HERMANN VON SCHELLINQ 


The time saved in using both formulas, of course, increases as the number of 
terms, n - n i — 1, of the original series, increases, 

Let us mention that the special distribution corresponding to (9) does not 
have finite moments, For arbitrary values of N, a, A the arithmetic mean is 

( 10 ) £(«)=- —-n,, 

a — A 


the expectation of n{n + 1) is 

(11) ^( B + l)] = | - : - |g ^. ) . ni (n,-H), 


and finally the variance 


( 12 ) 



(N - A)(N - a)[(m - 1)A + a] 
(a - A) J (a - 2A) 


• m. 


The mode can be derived from the fact that 


(13) 


w(n 4 1) = w(n) for n 


N 

a + A 


■fa ~ !)■ 


Especially we get w(ll) = tti(lO) for our numerical example. 

The mean and variance do not exist for a — A = 1, as in our example. How¬ 
ever, it is possible to find a number n so that W (n) takes any value near to 
unity, for instance .99. For large n and small n\ (9) yields the approximation 


W (n) = 


n(n - 1) • • • (n — ni 4“ 1) 


(N 4 ft — 1) (N 4 ft — 2) > • • (N 4 ri — fti) 


n — 


tr — 1 


L "1 


iN + n - 


fti+1 


Hence, W(2666) = .99 for our example. One needs 2666 trials if one wants a 
99% probability for getting three black balls. This surprising result cannot be 
derived from the original formula (7). 


REFERENCE 

[1] H von Schellinq, “A formula for the partial sums of some hypergeometrio series”, 
Annals of Math, Slat ,, VoJ. 20 (1040), pp, 120-122. 



ABSTRACTS 


461 


ABSTRACTS OF PAPERS 

(Abstracts of papers presented at the Chicago meeting of the Institute , April SS-S9, 1950) 

1. The Distribution of the Quotient of Ranges in Samples from a Rectangular 
Population. Paul R. Rider, Washington University, St. Louis, Missouri. 

The distribution of the quotient of the ranges of two independent, random samples from 
a continuous rectangular population is derived The distribution is independent of the 
population range and can be used to test the hypothesis that two samples came from the 
same rectangular population just as the distribution of the variance ratio is used to test 
whether two samples came from the same normal population. 

2. A Geometric Method for Finding the Distribution of Standard Deviations 
when the Sampled Population Is Arbitrary. (Preliminary Report). Paul 
Irick, Purdue University, 

For an ordered random sample, a* g g • • • g , chosen from a population, f(x), 
a g x g b, let r , = x,+i — x, ^ 0, i = 1,2, • • , n — 1. Make the transformation 



and call U' the 1/n I por tion o f the r' space bounded by the n - 1 sphere and hyperplanes, 
^ _ i 1 2 t rzi t . 

2 r. = 2 ns 2 , r. = a / -r,_i , t = 1, 2, • ■ • , « - 1, where s is the sample standard 

1 V i + 1 


deviation The point density in V, i(r'), is the transform of 

»b— Sr» 

S(r) - / /(*i)/(&x + n) ■ /(*l + *!+••■+ r„_i) dx 1 . 

J*,-« 

Change to generalized polar coordinates and call U the outer hypersphencal boundary 
of U' whereon the density is designated by 8(\/2 ns, v)< Then p(s), the probability law for 
s, is given by 

p(s) ds — nln nl, s’ > ~ i ds I ■ •/ i(-\/2ns, <p) sin" -1 (Oi ■ ■ • sin v>„-i • ■ d<pi, 

•'(=1 Jfn-l 


(n — t)(i -f-1) 


g if> x g arc cos 


tan<p,_ij , i = 1,2, ••• , n - 2, 


whenever b is infinite. The distribution of sample range is readily found in IT and is 
expressible in the same form as p(s) with the same limits of integration When b is finite, 
the complete integral holds only for 0 g s g (i> — a)/\/2fti there being n 2 /4 connected arcs 
in p( s) if n is even, and (n 1 — l)/4 aroB if ms odd. The axes are rotated to give relatively 
simple formulas for p(s) when n g 4, the case of n = 5 also being discussed. The method 
readily produces previously reported results for p(s). In the application of the method, 
particular attention has been paid to the Type III and polynomial Type I populations The 
density function provides much information concerning the form of p(s) for various popula¬ 
tions, and contours of constant S in U' are of theoretical interest. 



402 


NEWS AND NOTICES 


3. Probability of a Correct Result with a Certain Rounding-off Procedure. 
W. S. Loud, University of Minnesota 

Consider the problem of the addition of n numbers expressed in the base B of numeration, 
Supposing each number known to arbitrary accuracy, to obtain tile sum accurate to k places, 
one may round off each number to (fc + 1) places, add, and round the Bum to k plaeeB If 
tho numbers are assumed uniformly distributed, tho probability that the above procedure 
gives the correct result may be found explicitly by use of characteristic functions. If the 

base B is odd, the result is 2(rZ?) _l 1 sin"~ , Msin , /?a it - " -1 rfit, and if the bnBe B is even, 

Jo 

2(irB) -1 I sin® Bu cos u a - " -1 du Both formulas have the asymptotic formula 6'f*B( T n)-i/i 

Jo 

as n becomes infinite. 

4. Analysis of a One-person Game. (Preliminary Report). W. M. Kincaid, 
University of Michigan. 

The pioblcm of allocation of supplies is one which arises in many military and economic 
connections. The present report discusses a game constructed as a model of a simple situa¬ 
tion of this type. Tho player is given a supply of cards, and receives payments for giving 
these up when certain random events occur during the period of play. 

The optimal strategy, which maximizes the expected value of these payments, is gov¬ 
erned by certain critical times such that the player’s response to a particular event depends 
on whether it occurs before or after one of these times. 


NEWS AND NOTICES 

Readers are invited to submit to the Secretary oj the Institute news items of interest 

Personal Items 

Dr. Leo A. Aroian, on leave from Hunter College, is acting as a Research 
Physicist in charge of computations at the Hughes Aircraft Co., Department 
of Electronics and Guided Missiles, Culver City, California. 

Dr Ralph A. Bradley from McGill University, Montreal, Canada will join 
the staff as Associate Professor in the Department of Statistics at Virginia 
Polytechnic Institute on July 1, 1950. He will devote the majority of his time 
to research on rank order statistics. 

Dr E. R. Dalziel has relinquished his post as Assistant Master at Technical 
School, New Zealand, to become Senior Engineer with the Overseas Telecommu¬ 
nication Commission, Australia. 

On September 1, Dr. David Duncan from the University of Sydney, Sydney, 
Australia, will join the statistical staff of Virginia Polytechnic Institute as Asso¬ 
ciate Professor of Statistics. He will devote the majority of his time to teaching. 

Dr. C. H. Fischer has been promoted to the rank of Professor of Actuarial 
Mathematics m the Department of Mathematics and Professor of Insurance in 
the School of Business Administration, University of Michigan, Ann Arbor, 
Michigan. 



NEWS AND NOTICES 


463 


Dr. lti. J Gumbel, Professor of Statistics at the New York New School for 
Social Research, has been appointed Consultant to the National Bureau of Stand¬ 
ards and has been awarded a Guggenheim fellowship for finishing a book on the 
theoiy of extieme values. 

Dr. Eugene Lukacs, who has been on leave from Oui Lady of Cincinnati 
College and working as a Statistician for the U. S. Naval Ordnance Test Station, 
Inyokern, California, is transferring to the Statistical Engineering Laboratory, 
National Bureau of Standards, Washington, D. C 

Dr. R. B. Leipnilc, formerly a member of the Institute for Advanced Study, 
has accepted a position as Assistant Professor of Mathematics at the University 
of Washington, Seattle. 

Mr. Harold C Mathisen, Jr., of the Kaiser-Prazer Corporation has been 
transferred from Willow Run, Michigan where he was an Assistant to the Director 
of Sales, to Buffalo, New York, as Regional Credit-Distribution Supervisor 

Mr. Jack Moshman has resigned from the U. S. Atomic Energy Commission 
at Oak Ridge, Tennessee, to accept a position as Statistician with the Mathe¬ 
matics Panel of the Oak Ridge National Laboratory 

Dr. D N. Nanda is now acting as Senior Scientific Officer m statistics at the 
Technical Development Estt. Laboratory at Kanpur, India. 

Mr. Shanti A. Vora was awarded at the commencement June 5, 1950, the 
degree of Doctor of Philosophy in Mathematical Statistics from the University 
of North Carolina, Chapel Hill. His dissertation, entitled “Bounds on the Dis- 
tiibution of Chi-Square,” Avon the William Chambers Coker Award in Science 
for 1950 granted by the Elisha Mitchell Scientific Society for excellence in re¬ 
search m all the scientific departments of the university. He has been appointed 
Acting Assistant Professor in the Department of Statistics at Stanford Uni¬ 
versity, California, effective July 1, 1950, where he aviII be principally employed 
m research on sampling inspection 

Professor Abraham Wald, Chairman, Department of Mathematical Statistics, 
Columbia University, gai^e a senes of lectures on the theory of statistical decision 
functions at the Naval Oidnance Test Station, Inyokern, California, April 3-7, 
1950. Representatives from several organizations and educational institutions 
on the Pacific coast attended the lectures. 


A copy of the bulletin of the Graduate School of Public Health, University 
of Pittsburgh, has been received at the Secretary’s office. The program of the 
Department of Biostatistics aviII be of particular interest to readers of the Annals. 
The teaching and research activities of the Department of Biostatistics are aimed 
primarily at the development of methods for the statistical appraisal of the 
health problems of groups: the community, the family, and the special aggregates 
such as the population in industry and in school. 



404 


NEWS AND NOTICES 


The Educational Testing Service is offering for 1051-52 its fourth series of 
research fellowships in psychometrics leading to the Ph.D degree at Princeton 
University. Open to men who are acceptable to the Graduate .School of the 
University, the two fellowships each carry a stipend of $2,375 a year and are 
normally renewable. Fellows will be engaged in part-time research in the general 
area of psychological measurement at the offices of the Educational Testing 
Service and will, iu addition, carry a normal program of studies in the Graduate 
School. Competence in mathematics and psychology is a prerequisite for obtain¬ 
ing these fellowships. Information and application blanks may lie obtained from: 
Director of Psychometric Fellowship Program, Educational Testing Service, 20 
Nassau Street, Princeton, New Jersey. 


Preliminary Actuarial Examinations 
Prize Awards 

The winneis of the prize awards offered by the Society of Actuaries to the 
nine undergraduates ranking highest on the score of Part 2 of the 1950 Prelimi¬ 
nary Actuarial Examinations arc as follows: 

First Prize of S200 

Mattuek, Arthur P. .... Swnrthmoro College 

Additional Prizes of $100 

Dempster, Arthur P. .... University of Toronto 

llaslarn, M. Brent. University of Buffalo 

Iludek, Paul R. .. .. University or Minnesota 

Jamieson, J. Ilac. University of Toronto 

LefT, Milton M. .University of Western Ontario 

Milnor, John W.Princeton University 

Reynolds, William F.College of the Holy Cross 

Walter, John It . .University of Toronto 

The Society of Actuaries has authorized a similar set of nine prizes for the 
1951 examinations on Part 2. 

The Preliminary Actuarial Examinations consist of the following three ex¬ 
aminations : 

Part 1 Language Aptitude Examination 

(Reading comprehension, moaning of words and word relationships, antonyms, and 
verbal reasoning ) 

Part 2. General Mathematics Examination. 

(Algebra, trigonometry, coordinate gcomotry, differential and integral calculus.) 
Part 3. Special Mathematics Examination. 

(Finite differences, probability and statistics.) 

The 1951 Preliminary Actuarial Examinations will be prepared by the Educa¬ 
tional Testing Service and will be administered by the Society of Actuaries at 
centers throughout the United States and Canada on May 18, 1951. The closing 
date for applications is March 15, 1951. 

Detailed information concerning the Examinations can be obtained from: 

The Society of Actuaries 
208 South LaSalle Street 
Chicago 4, Illinois 




NEWS AND NOTICES 


465 


New Members 

The following persons have been elected to membership m the Institute 
(March 1, 1960 to May 31, 1950) 

Ard, Everett E., B.S (Kansas State Teachers College), Student, University of Michigan, 
1661 Monson Court, Willow Run, Michigan 

Balnbrldge, T. R., B S (Clemson College, S C ), Supervisor, Koda Quality Inspection 
Group, Tennessee Eastman Coiporation, Kingsport, Tennessee. 

Bankler, James D., Ph D (Rice Institute), Associate Professor, Mathematics Department, 
McMaster University, Hamilton, Ontario, Canada. 

den Broeder, Jr., George G„ B S (Wayne Umv ), Student, Wayne University, 459 East 
Grand Boulevard, Detroit 7, Michigan 

Casas, Luis T., Ph.D (Umv of Bogota, Colombia), Professor of Statistics, Umversidad de 
los Andes and Pacultad de Economia Industrial y Comercial del Gimnasio Modeino, 
also Statistician, Compania Colombiana de Seguros and Companie Colombians de 
Seguros de Vida, Apartado Nacional No. 2088, Bogota, Colombia. 

Clark, Charles R., B S. (Umv of Michigan), Student, University of Michigan, 1216 West 
Cross Street, Ypsilanh, Michigan 

Dolby, James L., M A (Wesleyan Univ.), Mathematical Physicist, Belding-Heminway, 
Inc., 66 Grove Street, Putnam, Connecticut 

Elfvlng, Gustav, Ph D (Helsingfors, Finland), Professor of Mathematics, University of 
Helsingfors, Finland, now visiting Professor, Mathematics Department, Cornell Uni¬ 
versity, Ithaca, New York 

Embody, Daniel R.,MS (Cornell Umv ), Staff Statistician, The Washington Water Power 
Company, P O. Drawer 1446 , Spokane 6, Washington. 

Frazier, David, Ph.D (Stanford Umv ), Research Chemist, Chemical and Physical Re¬ 
search Division, The Standard Oil Company (Ohio), 2127 Cornell Road, Cleveland 6, 
Ohio. 

Graf, Herman S., B.A (Alfred Univ ), Student, Department of Mathematical Statistics, 
University of North Carolina, 68 Winans Drive, Yonkers 2, New York 

Greenberg, Bernard G., Ph.D. (N C State College), Associate Professor and Acting Head, 
Department of Biostatistics, School of Public Health, Associate Professor, Institute 
of Statistics, Raleigh, North Carolina. 

Grenander, Ulf, Ph D (Stockholm Univ.), Department of Mathematical Statistics, Norr- 
tullsgatan 16, Stockholm, Sweden 

Grosh, Jr., Louis E., M.S. (Purdue Umv ), Research Assistant, Mathematics Department, 
Purdue University, W Lafayette, Indiana. 

Hoffman, Walter, M A (Wayne Univ ), Statistician, Research Laboratory, Childrens Fund 
of Michigan, 2903 Elmhurst, Detroit 6, Michigan. 

Hoffman, Robert G., A.B (Stanford Umv.), Student, University of Michigan, 420 Thompson 
Street, Ann Arbor, Michigan 

Hopkins, George D, B S. (Ohio State), Statistician, Sylvania Electric Products, Inc , 
Ottawa, Ohio, 404 N. Jameson Avenue, Lima, Ohio. 

Horowitz, Jacob, B S (Columbia Umv ), Graduate Student, Department of Mathematical 
Statistics, Columbia University, 662 Riverside Drive, Now York 27, New York 

Huntsberger, David V., M.S. (West Va Umv ), Graduate Student, Iowa State College, 
Ames, Iowa, 224 Pommel Court, Ames, Iowa 

Kempff-Mercado, Rolando, Lie en Ciencias Econ (Umv Mayor de San Andres), Secretary 
General of Yacimientos Petroliferos Fiseales Bolivianos (Bolivian Oil Field Authority), 
P 0. Box 1283, La Paz, Bolivia 

Kennedy, Muriel E., B Sc. (University of Alberta), Statistician, Special Surveys Divi¬ 
sion, Dominion Bureau of Statistics, 128 Mason Terrace, Ottawa, Ontario, Canada 

Lander, Elmer L„ B A (Western Reserve Umv , Cleveland), Student, University of Michi- 

yaxsvin /II_7 _ J . A ~ Of\ DJ 11/1 



4GG 


NEWS AND NOTICES 


Li-Min, Tang, M.S. (Univ. of Mich.), Student, University of Michigan, 1109 Willard, Ann 
Arbor, Michigan 

Lin, Shao-kung, M A (Louisiana Stale Univ.), Student, Department of Economics, Uni¬ 
versity of Illinois, 1202b IK. University Avenue, Uibana, Illinois. 

Mandelson, Joseph, ILK. (College of City of New York), Mathematical Statistician, Chief 
Quality Assurance Branch, Inspection Division, Office of Chief, Army Chemical Cen¬ 
ter, Maryland, SU Cuiur Hired, Edyewood Heights, Mat gland. 

Marthens, Arthur S., U S, (Carnegie Inst, of Tech.), Mathematical Statistician, Bureau of 
Ships, Navy Depaitinent, Washington, D. (!,, 1820 Manlier Street, Wilkinsburg, Penn¬ 
sylvania, 

Masel, Marvin, A M (Columbia Univ.), Engineering Statistician, Goodyear Aircraft Corp. 
Akron, Ohio, Y.M C A , Room SOD, 80 Center Hired, Akron, Ohio. 

McCune, Duncan C., B A (College of WooHter), Graduate Assistant, Mathematics Depart¬ 
ment, Purdue University, Lafayette, Indiana 

Meyer, Paul L., B H. (Univ of Washington), Research Fellow, Laboratory of Mathe¬ 
matical Statistics, University of Washington, 0200-Sth, ALA’., Seattle 6, Washington, 

Milberg, Stanley, M A, (Columbia Univ ), Klatistiemn, $20 'l'urrell Avenue, So Orange, 
New Jersey. 

Miser, Hugh J., Pli.I). (Ohio State Univ ), Operations Analyst Headquarters, United States 
Air Force, 2718 liluine Drive, Chevy Chase 15, Maryland 

Morrison, Milton, M.A (Columbia Univ.), Instructor of Mathematics, Stevens Institute of 
Technology, Ifoboken, New Jersey 

Mulholland, Hugh P., Pli.I) (Cambridge Univ , England), Associate Prolessor of Mathc- 
maties, American UniversiLy of Beirut, Beirut, Lebanon. 

Nelson, A. Carl, M.S. (Univ. of Delaware), Inst motor in Mathematics, University of 
Delaware, Newark, Marshallian, Delaware, 

Neuwlrth, Sidney I., B.A (N Y. Univ.), Statistician, Biological Research Laboratories, 
Sobering Corporation, 80 Orange Slrcel, Bloomfield, New Jersey 

Pierce, James A., B.A. (Westminster College, Fulton, Mo ), Graduate Assistant, Purdue 
University, $05 Sylvia, West Lafayclle, Indiana. 

Powell, Claude J., 1LS. (Umv of Tennessee), Quality Control Engineer, North American 
ltuyon Cmp , 015 flattie Avenue, IClidibclhton, Tennessee. 

Rojas, Basllio A., 15 S. (National College of Agiieulture, Mexico), Graduate Student, Iowa 
State College, Statistical Laboratory, Ames, Iowa 

Roseboom, John H., M.S. (Dartmouth College), Instructor, Department of Economics, 
Indiana University, Bloomington, Indiana. 

Sandlin, William T., A B (Marshall College, IIuntmgLon, W Va ), Independent Sales 
Engineer, 20 Fairfax Drive, Huntington, HY.it Virginia 

Schmitt, Samuel A., B.K. (Univ. of Chicago), Research Analyst, Department of Defense, 

1 Washington, D. C., DRO N Rhodes Street, Arlington, Virginia. 

Shaw, Richard H., M.S (Purdue Univ.), Research Fellow, Purdue Univorsity, F.P H.A. 
513-1 Airport Road, West Lafayette, Indiana. 

Smith, Hugh F., M S.A (Cornell Univ.), lhofessor of Experimental Statistics, Institute of 
Statistics, University of North Carolina, Box 51)57, College, Station, ltaleigh, North 
Carolina, 

Sommers, Lysle D., B.S, (Bowling Green Stale. Univ.), Sampling Assistant, Survey Re¬ 
search Center, University of Michigan and Graduato Student, 1583 Leeds Court, Willoui 
R,nii, Michigan 

Springer, Clifford H., M Sc (Purdue Univ ), Instructor, Department of Mathematics, Re¬ 
search Assistant, Statistical Laboratory, Recitation Building, Purdue University, 
Wesl T.afnyetLo, Indiana 

Stearman, Robert L„ M S (Oiegnn Stale College), Teaching Fellow’, Department of Mathe¬ 
matics, Oiegon State College, Corvallis, Oregon. 



REPORT OF CHICAGO MEETING 


467 


Tingey, Fred H., M S (Univ of Washington), Research Associate, University of Washing¬ 
ton, SS10 Goldcndale Place, Seattle 5, Washington 
Tipton, Lamar B., M A. (Columbia Univ ), Statistical Clerk, Standard Oil Co of New Jer¬ 
sey, 6 W. BO^th Street, Shanks Village, Orangeburg, New York 
Topp, Chester W., M A (Univ of Illinois), Associate Professor of Mathematics, Fenn 
College, 1524 Comton Road, Cleveland Heights 18, Ohio. 

Vora, Shantl A., M Se (Bombay), Student, Department of Mathematical Statistics, Uni¬ 
versity of North Carolina, 210 A, Philli-ps Hall, Chapel Hill, North Carolina 
Willis, Myron J., AM (Indiana Univ ), Instructor of Mathematics, Purdue University, 
Statistical Laboratory, Lafayette, Indiana. 


REPORT OF THE CHICAGO MEETING OF THE INSTITUTE 

The forty-third meeting and the first regional Mid-western meeting of the 
Institute of Mathematical Statistics was held on the campus of the University 
of Chicago, Chicago, Illinois on Friday and Saturday, April 28 and 29, 1950. 
The morning session on April 29 was held jointly with the American Mathe¬ 
matical Society. The following forty-six members of the Institute were registered 
as present: 

K J. Arnold, Max Astraclmn, Reinhold Baer, Alvin G Brooks, I W Burr, P G. Carl¬ 
son, Herman Chernoff, P S Dwyer, H P Evans, J S Frame, Alary Goins, R D Gordon, 
John Gurland, P R Halmos, P C. Hammer, W L Hart, M A Hatke, P, E Inck, Howard 
L. Jones, Leo Katz, J P. Kelly, W M Kmoaid, L. A Knowler, Tjailing Koopmans, F C. 
Leone, F W. Lott, W G. Madow, A. M. Mark, John W. Mauchly, Kenneth May, Duncan 
C. McCune, Cyril G Peckham, G B Price, P. R Rider, Norman Rudy, L J Savage, G. R 
Seth, Richard H. Shaw, Jack Sherman, M D Springer, Robert G D Steel, Z Szatrowski, 

J V Talacko, R. M. Thrall, L M Weiner, M E Wescott. 

Professor Lloyd A. Knowler of the University of Iowa presided at the Friday 
afternoon session. The program consisted of the following invited papers: 

1. Why and Where Should Courses m Statistics Be Offered to Engineering Studentsf M, E, 
Wescott, Northwestern University 

2 What and How Statistics Should be Taught to Engineering Students I W. Burr, Pur¬ 
due University. 

Following this session a tea was given by the Department of Mathematics of 
the University of Chicago 

Professor John Gurland of the University of Chicago presided at the Saturday 
morning session. This session was held jointly with the American Mathematical 
Society. The program was as follows: 

1. The Distribution of the Quotient of Ranges in Samples From a Rectangular Population 
Paul R Rider, Washington University, St. Louis, Missouri 

2. A Geometric Method for Finding the Distribution of Standard Deviations when the Sam¬ 
pled Population Is Arbitrary (Preliminary report) Paul Irick, Purdue University. 

3 Probability of a Correct Result with a Certain Roundmg-off Procedure W S Loud, 
University of Minnesota 

4. Analysis of a One-person Game. (Preliminary report) W. M Kincaid, University of 
Michigan. 



468 


REPORT OP CHICAGO MEETING 


Professor W. G. Madow of the University of Illinois presided at the Saturday 
afternoon session. The program consisted of the following invited papers: 

1 Correlation and Regression with Matrix Factorization. P. S. Dwyer, University of 
Michigan. 

2. The Identification of Structural Characteristics. Tjalling Koopmans, University of 
Chicago, and Olav Iteierspl, University of Oslo, Norway. 


K. J, Arnold, 
Associate Secretary 



THE PROBLEM OF THE GREATER MEAN 

By Raghtj Raj Bahadur and Herbert Robbins 1 
University of North Carolina 

1. Introduction and summary. Let , ir 2 be normal populations with means 
mi, m 2 respectively and a common variance a 1 , the parameter point 
oo = (mi i fth'.ff) which characterizes the two populations being unknown, and 
let Cl be an arbitrary given set of possible points a. Random samples of fixed 
sizes ni , n 2 are drawn from u-i, ir 2 respectively, giving the combined sample 
point v = (x u , , Xi ni , x n , x n , ■ , x 2n j. For reasons which will be 

made clear later in connection with practical examples, any function /( v) such 
that 0 < f(v) < 1 is called a decision function, and for any such f(v) the risk 
function is defined to be 

(1) r(f I “>) = max [mi , m 2 ] - miE[f | w] - m 2 E[ 1 - /1 u] > 0, 

where E denotes the expectation operator A decision function f(v) is said to be 
(a) uniformly better than f(v) if r(j\o>) < r(f \ a) for all w m Cl, the strict in¬ 
equality holding for at least one u, (b) admissible if no decision function is 
uniformly better than f(v), and (c) mimmax if 

sup [r(f | oj)] = inf sup [r(f | w)]. 

a*«n / ci)<n 

The “problem of the greater mean” is, for any given fi, to determine the mini- 
max decision functions, particularly those which aie also admissible. Special 
interest attaches to the case m which there exists a unique mimmax decision 
function J(v) (in the sense that if f(v) is any minimax decision function then 
/(d) = f(v) for almost every v in the sample space); such an J(v) is automatically 
admissible. 

The problem of the greater mean is, of course, a special problem in Wald’s 
general theory of statistical decision functions [I], Our results will, however, be 
derived by very simple direct methods which make no use of Wald’s general 
theorems. 

We cite without proofs a few examples in order to show how strongly the 
solution of the problem of the greater mean depends on the structure of Cl. In 
each case the minimax decision function is a function only of the two sample 
means 5i, x 2 . 

(i) Let ft' consist of the two points (a, b: a ) and (b, a: a), with a < b Then 
1 if nr?! — n 2 x 2 > (ni — n 2 )(a + b)/2, 

0 otherwise, 
is the unique minimax decision function. 

1 This work was supported in part by the Office of Naval Research. 

469 




470 


IIAGHU 1UJ BAHADUR AND HERBERT ROBBINS 


(ii) Let 0" consist of the two points (c + h, c: <r) and (c — h, c: <r), with h > 0 
Then 


(3) 




1 if 5 s ! > c, 
_0 otherwise, 


is the unique minimax decision function. 

(iii) Let ft'" consist of the three points ($, —|:1), Q, 3:1)) ( — $:1), and 

let rti = = ft. Then 


(4) 


/**00 = 


1 if e -2 "* 1 + c Snfa 
0 otherwise, 


< K 


where X is a certain definite constant, is the unique minimux decision function. 

The parameter spaces of two or three points specified in these examples are 
rather trivial, but in fact the corresponding decision functions (2), (3), (4) re¬ 
main the unique mininmx solutions of the decision problem with respect to 
much more goneral parameter spaces. Thus, for example, it is clear that f*(v) 
will remain the unique minimax decision function with lespect to any ft which 
contains ft' and is such that 


sup [r(f* | <o)] = sup [r(f* | «)]. 


Corresponding remarks apply to fl(v) and 
When 7ii = Ui , (2) reduces to 


(5) 


f(v) = 


1 if ii > i», 
0 otherwise. 


This decision function is of particular interest when both the means mi , viz are 
unknown. It will be shown that whether or not n v = f(y) is the unique 
minimax decision function under certain conditions on ft which arc likely to 
hold in practice, at least when both m and 712 are sufficiently large (Theorem 3) 
Likewise, which is the analogue of f(v) when one of the means (vh) is 
known exactly, is apt to be the unique minimax decision function in such cases, 
at least when rq is sufficiently large (Theorem 4). These results on f(v) and 
/2(«) form the main results of the present paper. 

So much by way of a general summary. We shall now give a practical il¬ 
lustration (another is given in Section 3) to show how the problem of the greater 
mean arises in applications. 

Suppose that a consumer requires a certain number of manufactured articles 
which can be supplied at the same cost by each of two sources m and n • The 
quality of an article is measured by a numeiical characteristic x, and it is known 
that in the product of tt, , x is normally distributed with mean m, and variance 
cr 2 , but the values of these parameters arc unknown. The consumer has ob¬ 
tained a random sample of n 1 and m articles from v\ and 7r2 respectively, and 
has found the values of x to be (x u , x u , ■ • • , x lni , x n , , ■ ■ ■ , x 2 „,) = v. 

What is the best way of ordering a total of N articles from the two sources? 



PROBLEM OP GREATER MEAN 


471 


The usual statistical theory, which confines itself to estimating the unknown 
parameters and to testing hypotheses of the form H§(yri\ = m 2 ), has at best an 
indirect bearing on the problem at hand. We therefore adopt Wald’s point of 
view and investigate the consequences of any given course of action. If the 
consumer orders fN articles from to and (1 - f)N from t 2 , where 0 < / < 1, 
then the expectation of the sum of the x-valucs in the articles he obtains will 
be N{m-J m-i (1 — /)) The maximum possible value of this quantity is N 
max [mi, wtj, and the “loss” per article which he sustains may therefore be 
taken as 

W (“, /) = max [rrti , m 2 ] - m,/ - m 2 (l - /) > 0, 

where co = (mi, m 2 : a) is the true parameter point. 

The consumer wants to choose f so as to make W as small as possible. If 
he knew mi to be greater, or to be less, than m 2 , then by choosing / = 1 or 0 
respectively he could make W = 0 But since he does not know which m, is 
the greater he will presumably choose / as some function of the sample point v. 
Suppose, therefore, that a “decision function”/(y), such that 0 < f(v) < 1 but 
not necessarily taking on only the values 0 and 1, is defined for all points v m 
the sample space and that the consumer sets / = /(w). 2 In repeated applica¬ 
tions of this procedure, the “risk” or expected loss (a double expectation is in¬ 
volved : the expected loss for a given / and the expected value of / in using the 
decision function f(v)) per article is given by (1), and the consumer will try to 
find an f{v) which minimizes this risk. Since the value of the risk depends on 01 
it is necessary to specify which values of w are to be regarded as possible in 
the given problem; let the set of all such u be denoted by 0. If the consumer 
agrees to adopt the “conservative” criterion of minimizing the maximum pos¬ 
sible risk, then the statistician’s problem is to find the minimax decision func¬ 
tions in the sense defined above We have given the solutions of this problem 
for certain types of parameter spaces. The reader will observe that each of the 
mimmax decision functions (2), (3), (4) was of the “all or nothing” type, with 
values 0 and 1 only. (Whether this remains true for every we do not know.) 
By using one of these decision functions in a given instance one arrives at either 
the best possible decision or the worst. The attitudes of doubt sometimes as¬ 
sociated with the non-rejection of the hypothesis Ho(mi = m 2 ) are therefore 

2 One might say that the consumer should choose / in the light of what he can infer from 
u about the m, . But this formulation as a problem in ordinary statistical inference (estima¬ 
tion and testing) is not relevant and may be misleading For example, a plausible f(v), 
based on the idea that the problem is one of testing hypotheses, is as follows• “Perform the 
two-tailed t test of Ho(mi = m 1 ) at the five per cent level If Ho is rejected set / = 0 or 1 
according as X 1 is less than or greater than x% . If Ho is not rejected set / = b” Another 
/(«), based on the theory of estimation, according to which the x, are the “best” estimates 
of the m, , is as follows "Set / = 0 or 1 according as x, is less than or greater than i 1 .” 
Actually, the latter procedure is, from the remarks above concerning (5), the “best” in 
a certain definite sense and under certain conditions, but this fact does not follow from the 
UBual theoiy of estimation 



472 


ItAHHlT RAJ BAHADUR AND IIKUBRRT ROBBINS 


irrelevant to tlie problem of the greater mean in the examples cited. (Cf. foot¬ 
note 2; also Example 1 in Section 3.) 

The risk function (1) is but one of a general class R of risk functions, to be 
defined in Section 2, which are associated with the problem of the greater mean. 
The most important members of R are (1) and 

(0) f(J | w) = /'(incorrect, decision using /(u) | «), 


whore “mi < and “mi > m 2 ” are the two possible decisions. The risk func¬ 
tion (6) is relevant to applications of a purely “scientific” nature in which the 
statistician is asked merely to give his opinion as to which population has 
the greater mean. Although the problem of constructing a suitable decision 
function for (6) is akin in spirit to the problems considered in the now classical 
Neyman-Pearson theory of statistical teats, no satisfactory solutions seem to 
be available. It is easy to see, however, that (1) and (li) are quite similar. Of 
course, in the case of (1) a decision function f(v) may take on any value be¬ 
tween 0 and 1 inclusive, while for (fi) wo allow only functions wlttch take on 
only the values 0 and I, corresponding respectively to the decisions “mi < m 2 ” 
and u m L > mi". We then have for any such /(«), 


(O') 


f(/|cu) = 


[P(S(i>) = 11 w) = E\f | oj] if mi < m 2 , 

P(J(v) = 0 | «) = 22[1 - /1 w] if m 2 > mi, 


[0 if mi - m 2 

and by comparison with (l) wo see that r(/ | to) = | Wi — m 2 1 r(J \ u) for all w, 
Now, in the three examples (i), (ii), (lii) cited above tho unique minimax decision 
functions happen to take on only the values 0 and 1, and | m 2 — m 2 1 is constant 
on each of the respective parameter sets. It follows that (2), (3), (4) are also 
the unique minimax decision functions relative to (G) and to ft', ft", O'" respec¬ 
tively. The remarks above following Example (iii) also remain valid for the 
risk function (6). 

We conclude this section with a remark on the methods of this paper Any 
decision function relevant to (G) is equivalent to a test of the hypothesis // 0 (mi < 
Wj) against the alternative Hi(mi > m%), the region (t>:/(t>) = 1) being the 
"critical region.” Hence the Neyman-Pearson probability ratio method can be 
used to obtain the unique minimax decision function with respect to (6) and 
an ft consisting of two (or more) points, and the result carries over to more 
general types of ft in tho manner already indicated. It turns out, however, that 
the dominant properties of the probability ratio tests are not confined to 
the class of tests alone, but extend to the class of all functions f(v) such that 
0 < f(v) < 1. This result (Theorem 1) enables us to solve the problem of the 
greater mean for the risk function (1) as well as for (6). The reader who is inter¬ 
ested in applications may turn to Section 3. 


2. Theorems. We require the following slight generalization of a well-known 
result of Neyman and Pearson [2] 



PROBLEM OF GREATER MEAN 


473 


Theorem 1. Let <i>(v), 4>i(v), ■ • , <t> r {v ) be summable junctions defined on 

a measure space E with points v and measure p, p{E) < ®, let Ci , • • ■ , c r be 
arbitrary constants, and let A £ E be such that 

\v e A implies <b(v) > c t ^,(w), 


v e E — A implies 0(v) <2 c,</>,(u). 


j 4> l dp = o» 


and let f(v) be any measurable function such that 
(9) 0 < f(v) < 1 

and such that 


(i — 1 , • • 9 j t)) 


(10) f f<t>i dp = a 2 (i = 1 

J E 

Then 

(11) f f<t> dp <[ <t> dp. 

Je J a 

Proof. J f<t> dp = J^f<t> dp + f<P dp 

< J f<t> dp + J f dp b 

= f fcf> dp + £ c, [ foi dp 

Ja i Jj-i. 

= j fa dp + £ f4>% dp - £ f<t>i dp 

= J f<j>dp + £ c, [a, - J^ f<t>, 

= J f4> dp + 2 c» [/. (1 - f)<t>> J 

= J 4> dp — J (1 — f)<t> dp + (1 — f) c >^ dp 

= f </> dp + £ (1 - /) (£ Ct4>< ~ dp 

< j <f> dp ^ 

•I A 


(i = 1, ■■ , r). 


by (9), (7), 


by (10), 


by (8), 


by (9), (7). 



474 


JtAGHU RAJ BAHADUR AND HERBERT ROBBINS 


Note 1. If the condition 
(12) n\v.<j>(v) = 23 c,0,(w) V = 0 


holds, then in order that the equality hold in (11) it is necessary and sufficient that 
(13) /(y) = Xa(v) a.e. (/*), 

where xa(v) is the characteristic function of the set A, 

I I if v t A, 

0 if v e E — A, 


Xa(v) = 


Proof. The sufficiency is obvious, To prove the necessity we observe from the 
proof of Theorem 1 that for equality to hold in (11) it is necessary that 


/(«) ( <t> (v) - 23 c,</>.(«) ) = 0 


and that 


These relations and (12) imply (13). 

Note 2. If relations (10) arc replaced by 


(1 ~ M)[<t>(v) - 23 


( 10 ') 


/ fyidn < a, 


a.e. (p) in E-A, 


a.e. (n) in A, 


(t - 1, 11 r), 


and if each of the constants c, is non-negative, then Theorem 1 and Note 1 remain 
valid. 

Theorem 1 has applications to a number of decision problems of a certain 
type. In the present paper we consider only the “problem of the greater mean” 
for two normal populations with a common variance a, where at least one of 
the means m x , m 2 is unknown. The following assumptions and definitions will 
be valid henceforth. 

(A) Em is the N = n x + n 2 dimensional sample space of points 
v = (®u , Xu , x lni ; xn , x 2i , ■ • • , X 2 „f). A measurable function f{v) de¬ 
fined for all v in E K is a decision function if 0 < f(v) < 1. fi(v) s= f 2 {v) means 
fi(v) — fi{v) for almost every v in E N . 

(B) J2 is a given set of points to = (mi, m 2 : <r), a > 0. Given to in 0, the prob¬ 
ability measure in Eir is that generated by the distribution function 

Ii(v | to) = fr it G [(x,i - m,)M 

»-i i-i 

where 


0{x) = (2 it ) -4 f 

J—oO 


e ulli du. 



PROBLEM OP GREATER MEAN 


475 


Given any function 4> = 4>(v) for which the mtegial exists we write 

E[cj> | o>] = f cj>(v) dK(v | cu). 

J En 

(C) Let t(“) = ((h . of) be a function defined for all 61 in with values in 
, and such that 

(14) m x < m, implies g x < g, (i,j = 1 , 2 ). 

Given p, 0 < p < 1, we define 

W(u, p ) = max , gr 2 ] - g x p - g 2 ( 1 - p), 

and given a decision function f(v) we define the risk function 

r(f | «) = E[W («, /)| = F(o>, £[/ | 6i]) 

^ = max [< 7 i, 172 ] - giE{f | <ui] - g 2 E[ 1 — /1 oi] 

The class of risk functions (15) corresponding to all functions 7 ( 01 ) which satisfy 
(14) is denoted by R (The two most "important members of R are ( 1 ), with 

7 ( 01 ) = (wii, mf), 


and (6), with 


f(0, 1) if mi < viz, 
7(w) = j (1, 0) if m x > m 2 , 
[(0, 0) if mi = m 2 


The risk functions (1) and (G) appear in the examples in Section 3.) Throughout 
this section r(f | co) will denote a fixed but arbitrary member of R. We shall use 
the notations 

h(a>) = | Oi - g* I , 

d(u) = (— + —\ (mi - m 2 )/v, 

\ni n-if 


x, — nf 1 S ®u ^ 2)- 

1 -1 

Theorem 2. Let 6ii = (mi, m 2 : <r) and a> 2 — (m > M 2 : v) be two parameter points 
such that 

d(wi) < 0, d(u i 2 ) > 0, h(oii)h(u 2 ) > 0. 

For any X, - °° < X < «, let />.(?;) he the characteristic function of the set 

( 16 ) A* = (thru (mi — mi)£i + n 2 (pi — mfjxi > Xo-j. 

Then . 

(i) Corresponding to any decision function f(v), there exists a X such that 

r(fi | on) = r(J | on), r(/x | 6i 2 ) < r(f 1 6i 2 ); 



47(3 


JiAGHU RAJ BAHADUR AND HERBERT ROBBINS 


the inequality ift strict unless f(v) = fx(v). 

(ii) Given any X, if /(u) is a decision function such that 

r(f | «,) < r{J\ | *,) (i = 1, 2), 

then 

m - am. 

(iii) There exists a unique c such that 

(17) r(J, | wi) = r(fc | Ms) = B say, 
and for any decision function f(v) we have 

(18) B < max [r(f | gr), r(f | &*)]; 

the inequality is strict unless f(v) = fe(v). It follows that f c (v) is the unique mnimax 
decision function corresponding to the two-point parameter space ii = (uj, « 2 ) 
Proof 3 (a) Let 4>(v), ffv) be the joint frequency functions of the sample 
point v corresponding to the parameter points tu 2 , ui respectively. It is readily 
seen that for any X there exists a unique constant ri(\), 0 < Ci(X) < », such 
that 

At. = {ti:tf(ti) > ci<h(t>)) 

(c]( «j) = 0, ci(«>) = oo). Moreover, since wj y* w a , 

p{v.<t>(v) ~ C!0i(v)} = 0. 

It follows from Theorem 1, Note 2, that if f(v) is any decision function such 
that 

E[j | <,.] < E\fx | an), 

then 

E[f | co 2 ] < E\fx I «*], 

and the strict inequality holds unless f{v) s f.(v). 

(b) It is dear from the definition (16) that for any fixed parameter point « 
the functiop 

E\fs | «] - P(A X | «) 

is continuous and strictly decreasing from 1 to 0 as X varies from — °° to + ». 

(c) For any decision function/(a) and any parameter point m we have by (C), 

r(f | w) = max [qi , Qi] - giE\f | m] - giE[ 1 - / | to]. 

Hence 

^ j r (/ I «i) = Kwi)E[f | «i], h(on) > 0, 

\r(f | 02 ) = ^(^^[l — /1 wj], h(ui) > 0, 

5 Theorem 2 (as also Example (iii) of Section 1) could be derived from Wald’s general 
results on the completeness of the class of Bayes solutions of statistical decision problems. 



PROBLEM OF GREATER MEAN 


477 


Since for any decision function/(u), 0 < E[f | wj < 1, we can by (b) choose X 
so that 

(20) = ®[/UL 
and by (a) it follows that unless/(u) = f\(v), 

(21) E\j y | «J > E\J | w 2 ] 

(l). Follows from (19), (20) and (21) 

(ii). Follows from (19) and (a). 

(in). 1 (17) follows from (19) and (b). Then (18) follows from (17) and (ii). 

Theorem 2 provides the solution of any problem of the greater mean when 0 
consists of just two points wi, w 2 . For, the problem is trivial unless d(wi) d( co 2 ) < 
0 and /i(oji)/i(« 2 ) > 0, and in the non-trivial case the unique minimax decision 
function is f c (v) defined by (17). Moreover, it follows at once from the defini¬ 
tion that if /(a) is the unique minimax decision function with respect to some 
parameter set 12, then it remains so with respect to any 12 such that 12 2 12 and 

sup [rtf | a))] = sup [rC/1 &>)] 

oi cQ os a 0 

By taking sets 12 which consist of two points, Theorem 2 can therefore be used 
to obtain sufficient conditions for an f(v) = fjv) to be the unique minimax 
decision function with respect to a quite general 12. (It is clear that results 
analogous to Theorem 2(iii) but pertaining to more than two parameter points 
can be derived from Theorem 1, and that these results can be exploited in a 
similar way. An instance of this procedure where 12 consists of three points will 
be given at the end of this section.) 

The theorems which follow exploit Theorem 2 in this way to obtain conditions 
on 12 under which the decision functions f(v) and /°(n) defined by (5) and (3) 
are minimax. We consider f(v) first. From (C) we have, after a simple compu¬ 
tation, 

(22) rtf 0 | co) = h(w)-G(— | d(o>) |). 

Theorem 3. Suppose that there exist sequences {«*}, {&>*} of points oi k = 
(mu , m 2k : <r k ), u k = (nu , ui * : <r*) in 12 such that 

(i) lim rtf 0 1 oi*) = sup [rtf 0 | «)] (^0, *>), 

k —►« id(Q 

(ii) d(w k ) = — d(w£), h((ti k ) = h{u k ), and n\m\ k 4- nymik = n-i^u + for 
every k = 1, 2, ■ • • . 

Then f(v) is an admissible mimmax decision function. If there exist 
wo = (mi, m% : a), wo = On, ms '• <*) *n 12 satisfying (i) and (ii), then f(v) is the 
unique minimax decision function. 

Proof. By (22) and (ii), 

(23) rtf 0 1 co*) = rtf 0 \ u k ) for every k. 



478 


RA.GIIT7 RAJ BAHADUR AND HERBERT ROBBINS 


Without loss of generality, we may assume the two sequences to be so chosen 
that h(u k ) = h{o}'k) > 0 for every k. Then, by interchanging corresponding 
members if necessary, we may assume that 

(24) d(wk) ~ — d(uit) < 0 for every k. 

Consider the two points w*, in ft with arbitrary but fixed k. Writing w*, 

for «i, wj respectively, and using conditions (ii), a simple calculation shows 
that the set defined by (10) is 

(25) Ax = {«:£, - Si > L), 

L being a strictly increasing function of X. 

Choose and fix an arbitrary decision function f(v ) f(v). Comparing (5) and 

(25) , it follows from Theorem 2(iii) and (23) that 

(26) r(f | a*) = r(f | a>£) < max [r(/1 w*), r(f | «J)]. 

Clearly, f(v) cannot be uniformly better than f(v) in 0. Again, from (26), 

(27) r(f | w*) < sup [r(/1 w)], 

tf|Q 

so that, since k is arbitrary, 

(28) sup[r(/° | w)] » lim r(/° ] «*) < sup [r(/ | «)]. 

w i fl A:-* «o u < U 

Since f(v) ^ f(u) in the preceding argument is arbitrary, wo have shown that 
(a) no /(a) can be uniformly better than f{v) and (b) sup [r(/° | w)] = inf sup 

w / u 

[r(f | w)], i.e. that /(v) is admissible and minimax. The last part of the theorem 
follows upon setting w* = wq in (27). This completes the proof of Theorem 3. 

The conditions on ft for f(v ) to be the unique minimax decision function may 
be writterl as follows: 

There exist = (mi, m 2 : u), wo = (mi , m 2 : <r) in ft such that 

(i) r(/° | co 0 )(=r(/ 0 | wo )) = sup [r(/° | w)] (^0, »), 

( 29 ) (ii) mi = m 2 + ) ( m i —™*)> to = m i +(^“T ~) ( Wl “ 

\Wi + n-i) \ni -+- n 2 / 

(iii) h{u a) = h(aia). 

For the important risk functions (1) and (6), (20)(ii) implies (29)(iii) (i.c, h(u) 
depends on | mi — m 2 1 alone). Moreover, wlien m = n% , (29)(ii) becomes mi - 
m 2 , M 2 = m,, Thus for (1) and (6), whonni = n 2 the conditions (29) reduce simply 
to the condition that at least two points in ft at which the risk for /°(y) is maximum 
he image points of one another in the plane [«: mi = ms). In particular, it follows 
that if 7ii = ji 2 and if the given set 0 is “symmetric” in the sense that whenever 
(mi , m 2 : <r) is in fi then (m 2 , mi : <r) is also in 0, then f(v) is the unique minimax 



PROBLEM OF GREATER MEAN 


479 


decision function provided that it attains its maximum risk m Q, the risk function 
in question beging (1) or (6) There are obvious modifications (involving two 
sequences of points in fl) of these remarks which assert that f(v) is at least an 
admissible minimax decision function in case f(v) does not attain its maximum 
risk in fi. 

We shall now state the result analogous to Theorem 3 for the case when one 
of the means is known exactly, say mi = c. The decision function fl(v) is defined 
by (3). 

Theorem 4. Suppose that there exist sequences {co*}, {} of points to* = (c + at, 
c: <fk), u'k = (c — a*, c: a k ) in Q such that 

(i) Inn r(/° | «*) = sup [r(/° | «)]. (^0, ») 

fc-tco wcQ 

(ii) h(aik) = h(u'k) for every k = 1, 2, • • •. 

Then fl(v) is an admissible minimax decision function. If there exist wo = (e + a, 
c: a), a> o = (c — a, c: cr) in £1 satisfying (l) and (ii), thenf° c (v) is the unique minimax 
decision function. 

The proof (based on Theorem 2(iii)) is similar to that of Theorem 3 and will 
be omitted. Note that for the risk functions (1) and (6), condition (ii) is auto¬ 
matically satisfied. 

The reader will have observed that results which may be obtained from 
Theorem 2 (iii) in the manner of Theorems 3 and 4 will assert the optimal char¬ 
acter of decision functions which are characteristic functions of sets of the type 
(w: axi + bxi > c). The following example, cited as Example (m) of Section 1, 
shows that for arbitrary the optimum decision function need not be of this 

type. _ , 

Suppose that ni = ni = n, that 0 consists of the three points 
wo = ( 5 , — 2 : 1 )> “1 = (it i : 1 )j “2 = ( — ^> ~ 2 : l)t 

and that the risk function under consideration is given by (1) or (6). Then the 
unique minimax decision function is f**(v) given by (4), where X > 0 is deter¬ 
mined by 

(30) E[. 1 - /** | wo] = E\J** I “J- 

The proof follows. f**(v) is the characteristic function of the set [v. <j>{v ) > 
cifn(v) + cMv)}, where <t>, 4n , <h are the frequency functions of the probability 
distributions in E in corresponding to the parameter points “ 0 , wi, w 2 respective y, 
with ci = c 2 = e n /\. Since for all X > 0, 

E[f** I wi] = Elf** | <*]. 

and since a unique X > 0 satisfying (30) certainly exists, it follows (cf. (19) and 
(C)) that 

- r(f** | wo) = r(J** | Wi) = r(f** | w 2 ) = B, 



480 


RAOHII RAJ BAHAI)tm AND HERBERT HOUSING 


say. Let/(e) be any decision function yi We shall show that 

(31) B < max [r(f | w<,), r(J | ui), r(j | u 2 )]. 

Suppose not. Then 

r(J I on) = Eif\o»\ < Ii\r I Wl ] - r(f** | «,), 

r(J | t*) ~ A’)/ | wd < E\r * I «*] - r(J** | «,). 

Then, by Theorem 1, Note 2, we must have E[/1 to p ] < E\f** ] to 0 ], ho that 

r(f I Wo) = 1 - m I Wo) > 1 - E\r I <*.] = r(f** I Wo) = B, 

contrary to hypothesis. Hence (31) holds, and since /(d) ^ is arbitrary 

our assertion is proved. (Note that 

r(f | wo) = r(f | w,) - r(f | Wg) 

also, so that /**(«) is uniformly better than /°(t') in S.) We remind the reader 
that J**(v) remains the unique miniinax decision function with respect to (1) 
or (G) and any 12 which contains w#, on , w*, and is such that sup [r(/** I w)] = B. 

u t 1 ) 

Whether a set S2 satisfies the bust condition will in general depend on whether the 
risk function in question is (1) or (G). 

3. Examples and discussion. In this section we shall discuss the relevance of 
Theorems 3 and 4 to two specific problems of the greater mean. The examples 
given arc purely illustrative and the reader will readily construct others in which 
the statistician is faced with similar problems of decision. 

Example 1 , A farmer F has tested two varieties tt, , 7 r 2 of grain in a field 
experiment in which n , plots were assigned to ir,, i ~ 1,2, all plots being of equal 
area. The plot yields obtained were y n , Vn , • • • , j/ ln , and y n , Vn , ■ ■ • , Vm, 
bushels respectively F gives this data to a statistician S for analysis. F is willing 
to assume that the yields per plot for each of the two varieties are normally dis¬ 
tributed with unknown means /r , ir and a common variance, also unknown. 
F says ho is particularly interested in whether the two varieties are “significantly 
different.” 

S is well aware that F'a interest in the varieties is not purely scientific—that 
is to say, F did not perform the field experiment for the sole purpose of estimating 
the unknown parameters or testing hypotheses concerning them. S also knows 
that it is very unlikely that pi is equal to m . 

Suppose that in fact F wishes to decide which variety ho should use next 
year on his land in order to make the maximum possible profit, and is afraid 
that if he were to act as if the observed mean yields , yi were the true popula¬ 
tion mean yields, he might make a gross error. So F is willing to compromise 
between the two varieties (that is, he will assign some fraction / of his land to 
iri and the rest to ir 2 ) in case S declares that there is no.evidence of the two varie¬ 
ties being different. t 



PROBLEM OP GREATER MEAN 


481 


If this is the case, S should ask F how much it costs him to use ir, and the 
price at which he expects to sell his grain. Supposing that these quantities are 
o, dollars per acre and b dollars per bushel respectively, and that the area of each 
plot in the field experiment was c acres, S will set 

m, = expected profit per acre in using variety t, 

= (b/c)n l — a. dollars (f — 1| 2), 

to = (mi, m 2 : <r), a being the variance of the profit per acre 

in using ir, (t = 1, 2), 

y(u) = (mi, m 2 ) (see Section 2, (C)), 

ft i 

x tl = (b/c)y t) - a, , x x = n~ l x » > v = (x n ,- 

3=1 

so that r(J \ to) is given by (1) and is equal to the expected loss (in terms or profit 
per acre) incurred by using the proportions f(p), 1 - f(v) of the varieties m , tt 2 
as compared with using the variety with the greater mean for the whole of the 
land Then if S is satisfied that the set 0 of possible points co satisfies the condi¬ 
tions of Theorem 3 he should recommend that F use m alone if x x > x 2 , and 
ir 2 alone if x 2 > x x , this being the safest procedure in the sense that it is the 

minimax strategy (cf. Example 1 in [3]). 

We shall illustrate by a simple example the obvious method of verifying 
whether f(v ) is the minimax decision function for a given fh We have by (22), 
using the risk function (1) obtained by setting y(u) = (mi, m*), 

(32) r(f | w) = h{u)G(— | d(w) |) 

= I mi — m 2 1 G{—( - H-) I mi — m 2 1 /o). 

1 1 \«1 712 / 


Now suppose that 
(33) 


l l 

a = {w.a - - < mi < a + ^ 


-i £rm<b + £.<ro - p < <r < 1 > \ a 


where a, b, l, <r„ , p(>0) are certain constants. By (32), the maximum risk occurs 
at some points in 12 for which <r = <ro. We have 

(34) K/° U - "») = Q; + £) ' [xG( ~ 2:)3, 


= x(a) = (-+-) I mi - m | /*> . 

x \7ii «*/ 


where 



482 


IIAGIIU ItAJ BAHADUR AND HERBERT ROBBINS 


If a — 6 and n 2 - Rs we see from the* remark following (29) that/°(y) is the unique 
minimax decision function. Suppose, therefore that a ?■* b or m p* n 2 or both. 
Now 


(35) sup [xCr(-a;)] = x n G(—x 0 ) - .1700 (approx.), 

* 

where xo = 7518 (approx.). If m L , ?n 2 were unrestricted, r(f | <r = <r 0 ) would 

be a maximum when I mi — m 5 1 => (-'■ + , by (34) and (35). Hence /%) 

\>h n 2 / 

will l>e the unique, minimax decision function if these two lines intersect the square 

a — " < 7Ri < a + b — < m 2 < b in such a way that at least two 

2 2 2 2J 


points lying on these lines and in the square sat isfy (29) (ii). This will be the case if 

n i — n 2 1 I 

4 


(3G) l > max 
where 


| a - b | + 7/o, max (| a - b | , ijo) -|- 

(- I- --Y. 

\»i nj 


Til “b Til 


TJo — XqCq 


We have assumed that l > | a — b ( , for otherwise either nil < mi or mi > mi 
for all io in fi, and there is no problem. It is therefore dear that for m and n 2 
sufficiently large, f(v ) will be the unique minimax decision function. That (36) 
is not a very strong requirement may be seen by setting a - b, n t = 2 n 2 , iu 
which case (30) reduces to 


l > Co 



(approx.). 


We remark that f(v) remains the unique minimax decision function for any 
Tii, Tii “when l = «>” so that fi is given by 


(33') fl = {to: — co < ?m < co, — co < m < « : tr 0 — p < cr < Co}. 


It is of interest to consider the “one sample” case when one of the means is 
known, say m 2 = c. This will be the case (approximately) if Tr 2 is a standard 
variety which has been m use for some time and m is a new variety. The analogue 
of the parameter space discussed above is then 


(37) 


By using Theorem 4 it can be seen that f c (v) as defined by (3) is the unique mini- 
max decision function if e = a or if c is not necessarily equal to a, but 

(38) >«.(s)'. 

where x 0 is given by (35). Since the left-hand side of (38) is positive, it is clear 
that f c (v) will be the unique minimax decision function with respect to (37) if 



PROBLEM OP GREATER MEAN 


483 


n j is sufficiently large. Note that f c (v) is the unique minimax decision function 
for any n\ when l = «> and 12 is given by 


(37') 12 = m 2 = c, — < mi < “ : co - p < o' < Co}. 

The reader may find it instructive to consider other plausible sets 12 which 
satisfy the conditions of Theorems 3 and 4 and also some which do not, assuming 
a = 1 for simplicity. It should be observed that no matter what 12 may be, pro¬ 
vided only that a <. cr o for all w in 12, we shall have by (32) and (35) 

sup [r(/° | w)] < .1700- co (- + —) (approx.). 

Wifi \W"1 fhj 


In a similar way it can be seen that for any 12 in which m 2 equals c and c < c 0 


sup [r(/° | w)] < .1700-c 0 ' 

o> 



(approx ) 


Example 2 . xi and x 2 are two soporific drugs, the random variables generated 
by them being the duration of sleep induced by a standard dose m an individual 
chosen at random. It is assumed that these two populations are normal with 
unknown means mi, m 2 and a common variance c 2 , also unknown. In a series 
of independent trials in which ni individuals received the first drug and n 2 the 
second, the outcome was v = (£u , £12 , ■■ , Xi ni , £21 , £22, • , x 2n J . The 

statistician S is required to say which is the more effective drug 
Here a reasonable risk function is ( 6 ), where f(v) takes on only the values 
0 , 1 , corresponding to the decisions “mi < m 2 ” and “mi > m 2 ” respectively. 
The problem of choosing/(w) so as to minimize this risk was considered by Simon 
[ 4 ], He showed that m case ni = n 2 , f{v) is the uniformly best decision function 
in the class of symmetric decision functions. (Given n x = n 2 = n, a decision 
function f(v) is said to be symmetric if fix 11, £12, ■ • > x m ; £21, £22, ■ • j :c 2n) = 

1 - fix 21 , £ 22 , • , £ 2 * ; xu , a* , •' • , * 1 - 0 . See also [3].) It is natural to confine 
oneself to the class of symmetric decision functions when the sample sizes are 
equal, but under the implicit assumption that if co = (a, b: a) is a possible param¬ 
eter point, then 01 ' = ( 6 , a: <r) is also (cf. the remarks following (29)). The 
illustrations in Section 1 show that if the sample sizes are unequal or if 12 is not 
symmetric in the sense just described, there may exist decision functions which 
are uniformly better than f(v): in ( 1 ) we have a “symmetric" 12 but n, 9 * n 2 , m 

(iii), m = n 2 but 12 is not “symmetric.” 

However f(v) is an admissible mimmax decision function no matter what 
the sample sizes, provided only that 12 satisfies a certain not too restrictive con¬ 
dition. We have 


(39) 


f(/° | a.) 


’<?(- | dice) I) 
0 


for mi ^Bi, 
for mi - m 2 . 


« For some purposes it would be more appropriate to take (1) as the risk function for this 
problem, letting the decision functions f(v) take on only the values 0 and 1 We have (essen- 
tjally) discussed this case in the previous example. 



484 


RAfim: IIAJ BAHADUR AND HERBERT ROBBINS 


It is dear that if [w*J is a sequence of points in 0 Mich that 

lim d{on,) - 0, then lim f(/° | u*) = * = sup (f(/° | «)]. 

Therefore, by Theorem 3, f(v) is admissible and mmimax if some point m the 
plane {to: mi = m 2 | is an interior point of the set U of possible parameter points 
(in fact it is sufficient if some plane <r = tr 0 (>0) intersects Si in a set which 
has an interior point on the line mi = m%). Hence if nothing much is known 
about the two drugs, S could regard the foregoing as a justification for asserting 
"mi > mi” if Xi > it and ‘bn, < m” otherwise. 

We have given no criterion for the choice of a suitable decision function when 
two or more admissible minimax decision functions exist, and our diffidence in 
recommending the use of f(v) in the present case is due to the fact that under 
the condition stated above there will exist decision functions other than f(v) 
which are also admissible and minimax with respect to (6). Let us suppose that 
f! is given by (33). Then f{v) is admissible and minimax, by the preceding para¬ 
graph. However, it follows from Theorem 4 that each of 




1 if fi > ci, 
0 otherwiao, 


and /o 1 ((-') 


0 if £i > <h , 
1 otherwise, 


is also admissible and minimax, where cy and c 2 are arbitrary constants with 
max [a, h] - ~ < Ci, c 2 < min [a, b] + ^. 

There is, however, some reason for preferring/(w) to other decision functions 
in the present case. S has been asked to give his opinion as to which is the better 
drug, and presumably no immediate consequences follow from the opinion which 
he might express, (This would not be the case if there were a sleepless individual 
on hand who had to be given a dose of one of the two drugs Cf. footnote 4.) 
Although the problem is of a scientific nature, insistence upon literal exactitude 
in the interpretation of “incorrect decision" is meaningful only insofar as it is 
compatible with the physical situation. In view of the limited determinacy of 
unknown parameters in general, and of the limitations of experiments on soporific 
drugs in particular, it may be possible and even desirable to modify ( 6 ) in such 
a way that for any fixed a the risk tends to zero with | mi — m a |. Thus modified, 
the risk function would bo essentially similar to (1). A rather drastic way of 
introducing this modification would be to agree that the assertion of equality 
of the two means does not constitute an error in case | mi — m a | < e, where e is 
some positive constant. S will then take 


(40) 


(f(f | to) if | mi - vu | > «, 
(0 otherwise, 


as the risk function. (Note that in using r,(f | to) rather than f(f | to), S has in 
effect deleted the set {to; | mi — m a | < e) from the given set il by defining y(to) = 



PROBLEM OF GREATER MEAN 


485 


(0, 0) there, instead of only when mi = mj as in the case of f(j \ «). Cf “zones of 
indifference,” [5, pp 27-30]). It follows from Theorem 3 that f(v) is the unique 
minimax decision function with respect to (40) and (33) if a = b and rii = n s 
and also if at least one of these conditions does not hold but 

7 ii — n 2 

Til + 712 

Thus f(v) will be the unique mimmax decision function no matter what ru, 
m , a, b or l may be, provided only that e is sufficiently small. We shall leave 
other modifications of f(f | w) and discussion of f (/1 «) with respect to other 
types of parameter spaces (e.g. (37)) to the reader. 

We conclude this discussion with a remark on the proper choice of n i and n 2 
in using f(v) when the risk function belongs to the class R defined in Section 
2, (C). (The risk functions (1) and (6) belong to R ) Suppose that before experi¬ 
mentation starts, it is agreed that one must have rii + n 2 = 2 fc, where k is a 
fixed integer. In that case, choosing ni = n 2 = k will be the best choice of n 2 , 
m in the following sense, (a) For any fixed u, r(J° \ u), which is the expected loss, 
then becomes a minimum . This follows immediately from (22), since 

r(f | co) = h(o>)G(- | d(u) |), [ d(co ) I = I mi - m 2 | /<r, 

and | d(co) \ has its maximum when ni — n 2 ~ k. (b) For any fixed w, the variance 
of the loss also becomes a minimum. In using f(v), the loss takes the values 0 
and /i(to) only, with P(loss = h{u) | «) = G(— | d(to) | ) = a say. Therefore, 
the variance of the loss is fia( 1 - a). Since a < |, this expression increases with 
increasing a, and so has its minimum when n-i = th == k. This remark is, of course, 
without prejudice to the question of whether f (t>) is admissible and minimax with 
respect to a given 0 for every rq and n 2 with ni + n 2 = 2 k 


l > max 


I a ~ 


5 | + e, max (| a - b ], e) + 



4. A remark on randomized decision functions. In the foregoing discussion 
we have confined attention to the class of non-randomized decision functions: 
the space of possible decisions being some subset of 0 < / < 1 , the statistician 
constructs (in advance) a suitable decision function f(v), obtains a particular 
sample point v by sampling the two populations, and takes f(v) as his decision. 
It is, however, of some theoretical interest to consider more general formulations 
in which the decision arrived at by the statistician may be a random function 
of the sample point v. 

A randomized decision function can be defined m several ways One definition 
is as follows. Let <j>(z | v) be a function defined for all v in E N and all real z such 
that for any fixed z it is a measurable function of v, and such that for any fixed 
v it is the distribution function of a random variable with values in 0 < a < 1 . 
We shall denote this random variable by Z+(v) and call it a (randomized) decision 
function. In using it, the statistician first obtains a particular point v by sampling 
the two populations, then performs a random experiment whose outcome Z 



486 


HAGII1J RAJ BAHADUR AND HERBERT ROBBINS 


has the known distribution function PiZ < z) — fi[z \ v), and takes Z as his 
decision. The class of all decision functions corresponding to all functions 4 >(z \ v ) 
will bo denoted by [Z*(.v) |. It is clear that this class includes the class of non- 
randomized decision functions. 

This definition of the structure of randomized decision functions follows the 
method described by Ilalmos and Savage in their interesting remarks ([ 6 ], pp. 
239-241) on the value of sufficient statistics in statistical methodology. For 
any Z t (v), we have 

P(ZM < z I 6 )) » f I } (Zt(v) < z I ta, v) dK(v I co) 

(41) 

~ <f>(z | w) dK(v | &>). 

We shall now show that in all problems of the greater mean in which the 
methods of Section 2 can bo applied to non-randornized decision functions, ran¬ 
domization cannot be recommended. More precisely, the following holds. 

Theorem. Let J(v) be a non-randomized decision function which takes on only 
the values 0 and 1 and which is the unique non-randomized decision function whose 
expected value E\J | w] satisfies a certain condition Q as a function of o>. Then j(v) 
is the unique decision function whose expected value satisfies the condition Q; i.e. if 
Z$(v) is a decision function such that E[Z « | co] satisfies Q, then 

(42) P(J(v) - Z+(v) | co) = 1 for all u. 

It follows in particular that Theorem 2 remains valid with the arbitrary non-random¬ 
ized /( v) replaced by an arbitrary Z^(v), and in consequence, Theorems 3 and 4 
remain valid when the class of decision functions in question is [Z+iv)}. 

Proof. Let Z+(v) be a decision function whose expected value satisfies the 
condition Q. Now, by (41) and Theorem 5 of [7] we have 

(43) E[Z„ M = f f*(v) dK(v | a) - E\f | co], 

•JRv 


where 


f*{v) = f zd,<t>(z | v), 0 <f*{v) < 1, 

JO 


It is clear from (43) that Elf* j w] satisfies Q and so we must have 
(45) f*(v) => /(y) a.e. 

by hypothesis, Since /(?i) takes on only the values 0 and 1, it follows from (44) 
and (45) that 


/ d,4>{z I v) = 1 a.e., 

•'i *-/(») l 



PROBLEM OP GREATER MEAN 


487 


which implies (42). In order to verify the last part of the remark, consider any 
particular problem of the greater mean The risk function of any decision func¬ 
tion Zt(v) is, by (15), 

r(Z* |«) = W(u,E[Z+\a>]). 

Hence a condition on the risk function of Z$ is equivalent to a condition on 
E[Zt | u] as a function of u, and the truth of the remark follows by appropriate 
definition of the condition Q in terms of the risk function, 

REFERENCES 

[1] A. Wald, “Statistical decision functions,” Annals of Maih Slat , Vol 20 (1949), pp 

165-205. 

[2] J Neyman and E S Pearson, “Contributions to the theory of testing statistical hy¬ 

potheses,” Slat Res Memoirs, Vol. I (1936), pp. 1-37. 

[3] R. R. Bahadur, “On a problem in the theory of k populations,” Annals of Math. Slat, 

Vol. 21 (1950), pp. 362-375 

[4] H A Simon, “Symmetric tests of the hypothesis that the mean of one noimal population 

exceeds that of another,” Annals of Math Slat , Vol 14 (1943), pp. 149-154. 

[5] A. Wald, Sequential analysis, John Wiley and Co., 1947 

[6] P. R. Halmob and L. J Savage, “Application of the Radon-Nikodym theorem to the 

theory of sufficient statistics,” Annals of Math Stat , Vol 20 (1949), pp. 225-241 

[7] H Robbins, “Mixture of distributions,” Annals of Math Stat., Vol 19 (1948), pp 360- 

369. 



ANALYSIS OF EXTREME VALUES 
By W. J. Dixon 1 
University of Oregon 

1. Introduction. It is well recognized by those who collect or analyze data 
that values occur in n sample of n observations which are so far removed from 
the remaining values that the analyst is not willing to believe, that these values 
have come from the same population. Many times values occur which are "du¬ 
bious” in the eyes of the analyst and he feeds that he should make a decision as 
to whether to accept or reject these values as part of his sample. On the other 
hand he may not. be looking for an error, but may wish to recognize a situation 
when an occasional observation occurs which is from a different population. 
He may wish to discover whether a significant analysis of variance indicates an 
extreme value significantly different from the remainder. Also, of comse, the 
extreme value, may differ significantly without causing a significant analysis 
of variance and he may wish to discover this. It is reasonable to Ruppose that a 
criterion for rejecting observations would be useful here also. The choice of a 
suitable criterion for rejecting observations introduces a number of questions. 

1. Should any observations be removed if wo wish a representative sample in¬ 
cluding whatever contamination arises naturally? In other words, it may be 
desirable to describe the population including all observations, for only in that 
way do we describe what is actually happening. 

2. If the analyst wishes to sample the population unaffected by contamination 
he must either remove the contaminating items or employ statistical procedures 
which reduce to a minimum the effect of the contamination on the estimates of 
the population. That is, he may wish to describe only 95% of his population 
if the description is altered radically by the remaining 5% of the observations. 
He may have external reasons which are good and sufficient for wishing to de¬ 
scribe only 95% of his observations, Suppose he wishes to use the sample for a 
statistical inference; the inclusion of all the data may sufficiently violate the 
assumptions underlying the inference to exclude the possibility of making a valid 
inference 

This paper will concern itself only with those problems which arise from Ques¬ 
tion 2. 

If wo wish to follow some procedure which attempts to remove contamination 
'vc must consider the performance of any proposed criterion with respect to the 
proportion of contamination the critorion will discover and, of course, the propor¬ 
tion of the “good” observations which are removed by the use of the criterion. 
But, perhaps more important, we must consider what sort of bias will result 
when the standard statistical procedures are applied to samples of observations 
which have been processed in this manner. 

1 This paper was prepared under a contract with the Office of Naval Research 

488 



EXTREME VALUES 


489 


If we wish to follow a procedure which will not search for particular values to 
be excluded but will minimize their effect if present, we must investigate the 
sampling distributions of these modified statistics and estimate the loss in in¬ 
formation resulting from their use when all observations are “good.” We must 
also investigate the expected bias which will result when “bad” items aie present 
even though essentially excluded. Perhaps most disturbing about the avoidance 
of “bad” items is the fact that a decision must still be made as to whether a 
“bad” item was present or not in order to know m which way our estimates may 
be biased For example, a sample mean computed by avoiding the two end ob¬ 
servations will not be a biased estimate of the mean of a symmetric population 
if both end items should actually be included or if both end items should not be 
included. However, if only one of the two should not he included this estimate of 
the moan will be biased 

2. Models of contamination. The performance of the various criteria for dis¬ 
covery of one or more contammators will be measured with reference to con¬ 
taminations of the following two types entering into samples of observations 
from a normal population with mean y and variance a , N(v, <r ) 

A. One or more observations from N(p + X<r, a), 

B One or more observations from Nip., \V). 

A represents the occurrence of an "error” in mean value such as will occur in 
dial readings ivhen enors are made in reading incorrectly digits other than file 
last one or two digits. Eriors of this sort may result from momentary shifts m 
line voltage or from the inclusion among a group of objects of one or two items 
of completely different origin, This type of contamination will be referied to as 
“location error.” B represents the occurrence of an “error” from a population 
with the same mean but with a greater variance than the remainder of the sample. 
This type of error will be referred to as a “scalar error.” It is likely that many 
errors could be better described as a combination of A and B, but a study of these 
two errors separately should throw considerable light on the question of gross 
errors” or “blunders.” 

Many authors have written on the subject of the rejection of outlying observa¬ 
tions. Apparently none have been successful in obtaining a general solution to 
the problem Nor has there been success in the development of a criterion for 
discovery of outliers by means of a general statistical theory; e g., maximum 
likelihood. A large number of criteria have been advanced on more or less intui¬ 
tive grounds as appropriate criteria for this purpose. In no case was investigation 
made of the performance of these criteria except for a few illustrative examples. 

References for the criteria discussed in the next section are given at the end 
of this paper. Indications are given as to the significance values available in 

those papers 



490 


W, J. DIXON 


3. Criteria to be considered. The performance of two types of criteria has 
been investigated for samples contaminated with location or scalar errors. 

a) <r known or estimated independently, 

b) <r unknown. 

The n observations are ordered an < x 2 < • • ■ < x„. The criLeria involving 
external knowledge of a are: 

A. x* teat, 

, 2(x - £)’ 

x = —5—. 

B. Extreme deviation, 



B, - * ~ for * ~ 


C. Range, 
n w 

Cl = - , V) = X n — Xi , 

<r 

Ct = -, s s — —- - (s independently estimated). 

S 71—1 

The criteria involving only the information of a single sample of n observations 
are: 

D. Modified F test. 

1. For single outlier an , 

A = where = E (x ~ Xi) s , Xi = E s/fa - 1), 

0 2 2 

S 2 = 52 (a — £)*, a; - S 

i i 

(orfor x n , A = ^ . 

2. For double outliers Xi, x 2 , 

A = %- 2 . where #S?, 2 = E fa — Xi, 2 ) 2 , Xi l2 = E x/fa-2) 

O* 3 3 

(or for x n , Xn—i, A = . 

E. Ratios of ranges and subranges. 

1. For single outlier Xi, 



EXTREME VALUES 


491 


r M = 


X2 — Xl 
x n — Xi 


(or for x n , r ia = —-. 

\ x n - Xi ) 

2. For single outlier Xi avoiding x„ , 

Xo — Xi 


r n = 


x n -i — Xl 


(or for x n avoiding Xi , ru = —-). 

V x n - x 2 ) 

3. For single outlier xi , avoiding x n , x n -i, 

Xi — Xi 


r w = 

X n -2 ~ Xi 

^or for x n avoiding .ri, x 2 , r n = 

4. For outlier Xi avoiding x 2 , 
x 3 — Xi 


x„ — Xn- : 
X n ~ Xi 


')• 


no = 


Xn - Zl 


_ Xn 2 \ 


( or for Xn avoiding x„-i, r w — -—-». 

X n X\ / 

5. For outlier xi avoiding x 2 and x n , 

x 3 — Xi 


rn = 


%n —1 


( or for x n avoiding £ n -i, x L , m = — - 7 — ) . 

\ X n ~ X% / 

6. For outlier xi avoiding x 2 and x n , :r„-i , 


x 3 — Xi 

Xn — 2 X\ 


^or for x n avoiding x n -i, xi, x 3 , rn 


Xn Xn —2 

X n - X 3 , 


F. Extreme deviation and standard deviation. 

For single outlier x n , 

„ Xn — X ( r C X — Xi\ 

F = —- I or for xi, t = —-—J ■ 

The performance of the large number of criteria listed here will be assessed 
with respect to discovery of contamination of the type given m Section 2. 



492 


W. J. DIXON 


4. Performance of criteria (estimate of a available). The x 2 test will of course 
give an indication of a largo dispersion and since the extreme values are chief 
contributors to the sum of squares, it is possible to use this test as a criterion for 
rejecting a value or values which are at the greatest distance from the mean 
It might be supposed the Ih and B« would give better results since particular 
attention is paid to the end item. The same argument would influence one in 
favor of Ci or Ci . The performance of C'j can, of course, be expected to vary with 
the degrees of freedom in the independent estimate of c. For this study the de¬ 
grees of freedom for this estimate were held to the single value 9 d.f. 

X may be used since if the value of x is too large (greater than some upper per¬ 
centage point for x) wc might reject the value most distant from the mean. 
X tables may be used for percentage points. Percentage points for the other 
statistics considered here are given in the references at the end of this paper, 

The criteria A, Bi , £ s , Ci, C% were investigated for a = 1%, 5% and 10% 
for X = 2, 3, 5, 7, where one or more items are selected from a population N(ii + 
Xcr, a 1 ) and the remainder from N(n, tr s ). Investigations were ako made for one 
item from N(m, XV) for X = 2, 4, 8, 12. The investigation was carried out by 
sampling methods Tho performances of different criteria were assessed for the 
same group of samples in order to obtain more precision in the comparison of the 
different tests. All of the points appearing on the graphs in the subsequent sec¬ 
tions of this paper were based on from CO to 200 determinations. 

The performance of tho above criteria is measured by computing the propor¬ 
tion of the time the contaminating distribution, provides an extreme value and 
the test discovers this value. Of course, performance could be measured by the 
proportion of the time the test gives a significant value when a member of the 
contaminating population is present in the sample, even though not at an ex¬ 
treme, However, since it is assumed that discovery of an outlier will frequently 
<be followed by the rejection of an extreme we shall consider discovery a success 
only when the extreme value is from the contaminating distribution. 

The performance was judged by applying the criteria to each sample, always 
suspecting an outlier in the direction of the shifted mean for location error. 
Since the location errors were inserted by adding a fixed value to one or more 
of the observations, the largest value was tested as an outlier. The measure of 
performance was the percentage of location errors identified. When the location 
error was not an outlier, no test was performed and a failure for the test recorded. 

In the case of the model of contamination involving the scalar error, the value 
was suspected which was farthest from the mean. This of course, alters somewhat 
the level of significance, but this procedure was followed alike for all criteria 
investigated. The performance was measured in the same fashion as for location 
errors. 

Considering first, location errors, a study of the performance curves showing 
the per cent discovery of contaminators plotted against X (the number of standard 
deviation units the population of contaminators is removed from the remainder), 
shows that the level of performance for <r known is considerably above the level 



EXTHEME VALUES 


493 


of performance when <r is not known The difference is greater for n = 5 than 
for n = 15 and, of course, the difference will diminish as the sample size increases. 
Figure 1 shows the performance curves for a = 5% (5% significance level for 
the test for an outlier) of B i = (x„ — x)/a for n = 5 and n = 15 and of no - 

—-for n = 5 and n = 15 

X n Xl 

The graphs for a = 1% and 10% would be similai m appearance. Figure 2 
indicates the change m performance for a = 1%, 5 %, and 10% The curves 
plotted are for Bi = (x n - x)/a The curves for A, B 2 , C, , C 2 show very similar 
results 

The curve for test By was used in Figures I and 2 since it gives the best per¬ 
formance of all criteria which are considered here if a single location error is 
present The curves showing the comparative performance of these criteria as 



Fig, 1. Improvement in performance ob- F10. 2 The effect of the level of signifi- 
tiuned with knowledge of a, a = 5%, n = 5, cance on the performance of fii ; a = 1%, 
15 5%, 10%; n = 5, 15 


well as one to be considered later (no) are given in Figure 3 for a = 5% and for 
n = 5 and n = 15. 

The following statements can be made from inspection of Figure 3: 

a) The differences among A, Bi, B 2 , and C\ are not great. 

b) The knowledge of a is less important in larger samples. 

c) The curve for C 2 lies above that of no for n — 5 and below that of no for 
n = 15. This is consistent with the use of 9 d f in the independent estimate 
of a 

If the question of ease in computation or application is important, it may be 
desirable to use B 2 or C 1 in place of B 1 for they are slightly easier to compute 
and it is not necessary to measure all observations to obtain the value of these 
statistics From Figure 3 it will be noted that the performances of these criteria 
are nearly as good as for Bi . If two outliers may be expected in a single sample, 




494 


W. J. 1J1X0N 


r, % 



Fia 3. Comparison of the, performance of ciiterifv using a known (or using external 
estimates of tx) and no for samples of size f> nntl 15, « = r>%. 


the performance of B 2 will be lowered and the performance of By and Cy will be 
improved. Any differences between the performance of By and the performance 
of Cy when two outliers are present was not dmcenutble for n = 5 or 15. Figure 4 
illustrates the improvement in performance for By for a — 5% and n = 15. 

The performance curves of these criteria if a scalar error is present are very 
similar to those above except that: 

1. A high level of performance is approached very slowly. For example, see 
Figure 6 showing the performance of By and r J0 for n = 5 and n = 15 and a = 5%. 

2. There is a smaller difference in the performance between the criteria with 
a known and a- unknown (see Figure 5). 

The performance of By and C'i are noticeably increased by the in.roduction 
of more contarainators while that of B 2 decreases. No difference in the perform- 



Fiq. 4 Companson of the performance of By for one and two location errors in samples 
of size 15, a j=^6% 


EXTREME VALUES 


495 


ance of B\ and Ci were noted for either ft = 5 or ft — 15. Figure 6 shows the in¬ 
crease in performance of two contaminators for B 1 for n = 15, a = 5%, 

The general recommendations for possibilities of either type of contamina¬ 
tion, location or scalar errors, would lead one to the use of Bi or C'i if a is known. 

Criterion C i is recommended since: 

1 Its performance is almost a3 good as the performance of Bi for a single 
outlier. Their performances are about equal for two outliers and Ci affords pro¬ 
tection for outliers either above or below the mean. 

2. It is simple to compute. 

If ease of computation is not essential and maximum performance is desired, 
the criterion Bi should be used. The performance of C 2 will approach that of 
Si as the number of degrees of freedom in the denominator increases. 



Fig. 5 Comparison of the performance of Fig 6 Comparison of the perfo mance 
Bi and r u ft*' one scalar error for samples of Si for one and two scalar errors in samples of 
size 5 and 15, a — 5% size 15, a = 5% 


6. Performance of criteria (no external estimate of tr). Criteria Di and Dj 
have strong intuitive reasons for their use since the dispersion is estimated by 
s 2 . The r ratios aie attractive because of their simplicity and their preoccupation 
with the extreme values. Test F is the “studentized” ratio corresponding to Bi , 
and is equivalent to Di since Z>i = 1 — F 2 / (n — 1). There is no apparent dif¬ 
ference in the performance of D\ and rio when one outlier is present and no 
apparent difference in D 2 and r%i when two outliers are present This is true for 
both models of contamination and for the three levels of significance investigated. 
However the comparison of D 2 and r%> was made only for n = 5 since critical 
values are not available 2 for D 2 for n = 15. (Critical values are available for 
« < 12 .) 

The performance of D\ and rio under the two models of contamination can 
be obtained by reference to the curve for rio in Figure 1 and Figure 5. The curve 
for Z>i is practically identical with the curve for rio. 


2 After this paper was submitted, the critical values of D 2 have been extended to n < 20 
(see references) 


W. 3. DIXON 


There is no question that r ia is simpler to use, so that if this condition of 
contamination (scalar errors) exists, no would probably be chosen. However as 
before, we should investigate what happens when more limn one error is present, 
D t is designed for this case as is . Since the performance of these two criteria 
is approximately the same, rso would probably he chosen because of its simplicity. 
Critical values for this statistic, are available for n < 30. 

m , fu i rso, rsi, rn were designed for use in situations where additional out¬ 
liers may occur and wo wish to minimize the efTcct of these outliers on the in¬ 
vestigation of the particular value being tested. 

It has been suggested that Di could be used repeatedly to remove more than 
one outlier from a sample. This procedure cannot he recommended Bince the 
presence of additional outliers handicaps the performance of both 7) 1 and r 10 
for small sample sizes and therefore the process of rejection might never get 
started. For larger sample sizes the performance of Di is affected much less by 
the presence of two errors than is the performance of r ia . The repetitive use of 
Dy is not recommended in this case either since rw performs in a superior man¬ 
ner to Di in. such situations. This difference in performance of Zb and r w de¬ 
pends markedly on Lhe level of significance used as well as the sample size. 
For small samples there is little difference in performance for any of the levels 
of significance one might use. For the larger sample sizes there is no appreciable 
difference for very high levels of significance. The difference is however very 
great for lower lovols of significance. In fact as X increases for two errors of the 
location type, the level of significance which divides the region of approach to 
zero performance from the region of approach to perfect performance of D^ is 

given by tholevelofsignificancecorrespondingtoasignificance value of 

for Di. Thus, for example, in samples of size 15, ^ - ~ - ~ = .536. 

This value lies between the values for the 2.5% and 5% level of significance. 
These values are .503 and .556 respectively. Therefore the use of the 1% or 
2.6% levels will give poorer and poorer performance as X increases, and the 
use of the 5% or 10% levels will give better and better performance as X increases 
when two errors are present. The dividing point is such thi*t for samples of 
size 11 or leas the use of any of the given levels of significance will cause the 
performance to decrease as X increases. For samples of size n < 14 the 1%, 
2.5% and 5% levels have the same effect, and for samples of size n < 16 the 1% 
and 2.6%, for samples of size n < 19 just the 1% level. For three such errors 

the limit approached by Di as X increases is Therefore, the perform¬ 

ance of Di will approach zero for all levels of significance and for all sample 
sizes for which critical values are known except the 10% level of significance 

fc “ 1 7l 

for sample sizes larger than 21. An indication of these limiting values — v — * t _j -j 
for fc contaminations present can be obtained by considering these k values to 



EXTREME VALUES 


497 



Fig. 7. Comparison of the performance of Fig. 8 Comparison of the performance of 
the T criteria for one location error in the r criteria for one scalar error in samples 
samples of size 5, a = fi% of size 5, a = 5% 


be at a distance k from the population mean, computing Z>i and allowing \ to 
increase indefinitely. 

The comparative performance of the r criteria, a = 5%, m samples of size 5 
for the two models of contamination (one contaminator present) are given m 
Figures 7 and 8. For samples of size 15 the curves are given in Figures 9 and 10. 
A single curve suffices here since there is no discernable difference m the curves 
for the different r criteiia. There is considerable difference in the performance 
curves if more than one outlier is present. However, the performances of no, 
ni i ri 2 are essentially the same when two location outliers are present as are 
the performances of no, ni, Hi • Figures 11 and 12 show the comparative per¬ 
formance of no, m , rn for one and two contaminators for a = 5% and n = 5. 
Figures 13 and 14 are for n = 15. Figures 15 and 16 show the comparative per- 



5% 




Fig, 11. Comparison or tlie performance Fia. 12. Comparison of tho performance 
of tho n. criteria for one and two location of tho r,. criteria for one and two scalar 
errors in samples of size 5,« ■» 5%. errors in samples of she 5, « «= 5%. 

formancc for r m , r 2 1 , (r w is not a test for n = 5) for one and two contaminators 
for « = 5% and n = 5. Figures 17 and 18 are for r n , r n , r n for n = 15. The 
six curves represented by the single curve of Figure 17 lie within 5% of the 
curve shown. The same is true, of the three curves represented by each of the 
two curves of Figure 18. 

Since no loss in performance results for larger samples from the use of r 20 , 
r n , r n in place of r w , r u , r« , and further, these criteria are not appreciably 
affected by the presence of another outlier it would seem unwise to recommend 
the use of rm , m , ru . However, note that for small samples (see Figures 11 and 
12) the performances of rm and r u and fts are considerably better when a single 



Fig. 13 Comparison of the performance Fig, 14. Comparison of the performance 
of the r,. criteria for one and two location of the r,. criteria for one and two scalar 
errors in samples of size 15, « = 5%. errors in samples of size 15, a => 5%. 





EXTHEME VALUES 


499 



Fig 15 Comparison of the peiformance Fig 16 Conipai isson of the peifoimance 
of the r 2 criteria for one and two location of the criteria for one and two scalar er- 
eriors in samples of size 5, a = 5% rors in samples of size 5, a = 5% 


outlier is present Therefore in larger (n > 10) samples ?- M oi n i would appear 
to be the best criteria In samples of size 10 or less, no or r 2Q should be used; 
n 1 if the extreme value at the opposite end should be avoided. 

It should be noted in the comparisons that no model of contamination was 
investigated which would cause one or more errois at both extremes in the 
sample It is obvious that the performance of D x and D a would be considerably 
decreased while the performance of m , fn , and ni, r 2 a would not be materially 
affected since these criteria avoid values at the opposite extreme Then repeated 
use might discover most of such outliers, while A. or Da might fail on the first 
trial. 



Fig 17. Comparison of the performance Fig. 18. Comparison of the peiformance 
of the 7*2 criteria for one and two location er- of the t 2 ■ cutenn for one and two scalar er¬ 
rors in samples of size 15, a = 5% rors in samples of size 15, a = 5% 



w. j. mxoN 


500 



Fig. 10 Performance of Bi for various levels of significance when the population is 10% 
contaminated with location errors 


6. Sampling from a contaminated population. In the previous sections tlie 
performance of the various criteria were assessed for samples whore a certain 
number of contaminators were present. One might well ask why a test is needed 
is it is known that contaminators are present. It would seem more realistic to 
state that a certain per cent of contamination will occur in the long run and 
that one will not know in any particular ease whether 0, 1, 2, ■ • • contaminators 
will be present. One would then wish a criterion to indicate the presence of 
contamination in a particular sample. 

The performances of these criteria will be investigated for the same two 
models of contamination and their performances will be reported as per cent of 



n = 5 n = 16 


Fig. 20. Performance of B\ for various levels of significance when the population is 10% 
contaminated with scalar errors 


EXTREME VALUES 


501 



Fig. 21 Performance of B 1 for various levels of contamination for location errors and 
using the 6% level of significance 

total contamination discovered. The tests will be applied only once to each 
sample. Repeated use of the criterion would in many cases increase the per cent 
of total contamination discovered It is not known what effect such a procedure 
would have on the level of significance. 

Investigation has been made for 5, 10, and 20% contamination. For example, 
in samples of size 5 which have 10% contamination, on the average, 59.0% of 
the samples will contain no “errors”, 32 8% will contain one, 7.3% two, 0.8% 
three, 0.1% four, and 0.0% five. Thus in 100 samples of 5 which are 10% con¬ 
taminated with location errors having mean /i + 5c, about 59 contain no errors. 
If the rio criteria is used with a 5% level of significance one value will be “dis- 



Fig. 22 Performance of B 1 for various levels of contamination for scalar errors and 
using the 5% level of significance. 



W. J. DIXON 


TT 


' ! j i 

."'jsL-'-'I i 
Vwf'l 1 

i j 


L^=f_—Ji2-4—4— -+-+ 

Of 234-£& 7a 


ua 

O / £ 


3 -5" 6 7 a 


(Location) (Bcnlai) 

Kio 38. Performance of r,», />, . r M , 0= >n samples of sice 3 using the 5% level of signifi¬ 
cance and sampling from a population which is 10% contaminated. 

covered” in 3.0 of the samples containing no errors. Of the 33 samples containing 
one "error” the “error” would by discovered in 18 of these samples. 1 his criteria 
would discover none of the “errors” in samples containing more than one cn 
ror”. We would have obtained 18 of the 50 contaminating values and 3 which 

were members of the original population. • + 

When <r is known the performance will increase when more oontam■ a 
are present. Performance however has been measured m terms of fmdrng^ 
single contaminator; i.e., the test has been used on y once. , 

increasing percent contamination the level of performance will decreesw rti 
increasing contamination. Repeated use of the test criteria has not been 
vestigated. 


Y 



Fig. 24 Performance of r u (Di) and r«(A ,r» , ra) for various levels of signifies 
when, the population is 10% contaminated with location enors 


EXTREME VALUES 


503 



Fig. 25 Performance of n 0 (A) and r !2 (Di , r a , r ai) for various levels of significanc e 
when the population is 10% contaminated with scalar errors. 


Criteria B x gives the best performance for both location and scalar errors for 
the levels of contamination and levels of significance considered. A and C x are 
only slightly inferior. B 2 is handicapped when more than one error is present 
thus its performance is poorer for heavier contamination Figure 19 shows the 
performance of Bi for the different levels of significance, 10% contamination, 
and the two sample sizes 5 and 15 for location errors Figure 20 shows the results 
for scalar errors. Figures 21 and 22 show the performance of B x for the 5% 
level of significance for the different levels of contamination 
When a is not known the performance of various criteria will eventually 
decrease as more and more contammators are present m the sample even though 



Fig 26. Performance of rio(-Di) and Th{Dl , r» , r«) for various levels of contamination 
for lotatien errors and using the 5% level of significance 


504 


W. J. DIXON' 



Fia. 27. Performance of r w (A) and r,,(A , r M , r !t ) for various levels of contamination 
for scalar errors and tbo 5% level of significance, a => 5%. 


several of the criteria show improvement in discovering a single error if two 
are present. The performance of these criteria is greatly affected by the size 
of the sample. For samples of size 5, no and I\ perform alike, r w being superior 
to the other r’s (rw second best) for the levels of contamination considered, 
and D, is inferior to r M . Figure 23 compares the performance of no, A, no, 
and Di for the 5% level of significance and 10% contamination. The results 
for other levels of significance and contamination are comparable. 

For samples of size 15, no, r« and n» perform alike as do no, m and n 2 . A 
and r M , m, r« perform approximately the same and are superior to no, ni, 



Fig 28. A comparison of the performance of r SJ and A for two scalar contammators 
when tests are made at one extreme only, a =• 5%, n = 15. 




EXTREME VALUES 


505 


and 7*12 ■ Critical values are not available for ZX for n )> 12. The performances 
of Di, r-ia , r 2 i and r 22 are indicated by a single line in Figures 24, 25, 26, and 27 
which, show the effect of level of significance and level of contamination of the 
performance of D 1 , r 20 , r 2 i and r 22 for samples of size 15 and for n„ (D{) for 
samples of size 5. 

7. Remarks and conclusions. Throughout the investigation of performance, 
location errors were placed only at one extreme and scalar errors at either ex¬ 
treme. The test for an error was made using as a suspected value the extreme 
value in the direction of the location error or in the case of the scalar error the 
value most distant from the mean. It can be expected then that if performance 
were assessed when location errors could occur in either direction, different 
results would be obtained Also in the case of scalar errors if errors were always 
sought at one particular extreme or at both extremes diffeient results would be 
obtained. If these changes were made in the models of contamination, those 
criteria designed to avoid errors at the other extreme would have an advantage 
over those which were not so designed for a unknown. If a is known the criteria 
which do not avoid the other extreme would have an advantage over those 
which do avoid the other extreme. These points just mentioned will be used to 
discriminate between those criteria which were judged to be equal m perform¬ 
ance under the models used in the sampling study. For example, Figure 28 
compares the performance of r 22 and Di for two scalar contammators when 
tests are made only at one extreme, a = 5%, n = 15. 

1. For a known: 

j?i or Ci should be used, or in small samples A, or Ci should be used 

2. For <t unknown: 

rio should be used for very small samples. r 22 should be used for sample sizes 
over 15. Probably r 2l would be best for sample sizes from about 8 to 13 If sim¬ 
plicity in computation is not important and “errors” are not expected at both 
extremes D\ would do equally well. When critical values are available for larger 
n, D 2 should prove useful in the larger sample sizes 

LITERATURE REFERRING TO CRITERIA LISTED IN SECTION 3 

(Bi) A T McKay, “The distribution of the difference between the extreme obseivation and 
the sample mean in samples of 71 from a normal universe,” Biometnka, Vol 27 
(1935), pp. 466-471. Procedures for obtaining percentage values given. 

(B 2 ) J. 0 Irwin, "On a criterion for the rejection of outlying observations,” Biomelrika, 
Yol 17 (1925), pp. 238-250 Pr(Bi > X),X = .1(.1)50, n = 2,3,10(10)100(100)1,000. 
Tables concerning the second and third ordered observations are also given. 

(Cl) E S PEARSON AND H O Hartley, “The probability integral of the range in samples 
of n observations from the normal population,” Bioiuctrika, Yol 32 (1942), pp 
301-310 0 1%, 0.5%, 1 0%, 2.5%, 5%, 10%, n = 2(1)12, values to 20 available by 
interpolation. 

(Ci) D Newman, “The distribution of ranges in samples from a normal population, ex¬ 
pressed in terms of an independent estimate of the standard deviation,’ Biometnka , 
Vol 31 (1940), pp 20-30. 1% and 5%points for C 2 ; for w, n = 2(1)12, 20; s, d.f = 
5(1)20,24, 30 , 40, 60, », 



500 


W. J. DIXON 


(fi) E. S, Pearson ami II. O. II.urn.BV, "Tables of Hit* probability integral of the student- 
izeti range/ 1 Hiom/lnka, Vol. 33 (10-12), pp, 80-00. Upper and lower 5% and IV 
points for Cs , for u>, n «=> 2(1)20; for n, d.f. ■= 10(1)20, 24, 30, 40,60,120, «, 

(Ct, Bi) K II Naib, "The distribution of the extreme deviate fiom the sample mean and 
itsstudentized forms," Binmitrikn, Vol. 35 (11)48), pp. 118-144. B t uppei and lower 
■1%, -5%, 1%, 2 5%, 5%, 10% jioinl# for n « 3(1)9. 

(Di , D j, F, B\) E. E. Grunins, "Sample criterion for testing outlying observations 11 
Annals of Math, Slal., Vol. 21 (1950), pp. 27-58. F, L\ : 1%, 2 5%, 5%, 10%, n ^ 25- 
Df. 1%, 2.5%, 5%, 10%, n < 20; B ,: 1%, 2.5%, 5%, 10%, n < 25. 

(F) W. R. Thompson, "On a criterion for the rejection of observations and the distribution 
of the ratio of deviation to sample atatidard deviation,” Annals of Math. Stat, 
Vol. 0 (1935), pp 214-219. 20%, 10%, 5%, n =» 3(1)22(10)42,102, 202, 502, 1002, 

C F ) E. S. Peauson and Gkandka Skkak give a furLher discussion of F in “The efficiency of 
statistical tools and a criterion for the rejection of outlying observations,” Bio- 
rnetrika, Vol. 28 (1930), pp. 308-320.10%, 5%, 2.5%, 1%, n - 3(1)19. 

(r’s) W. J, Dixon, “Ratios involving extreme values,” Annals of Math, Stal , to bo pub¬ 
lished. n, , r„ , r,j, r M , r sl , r 3I ; 5%, 1%. 2%, 5%, 10%, 20%, 30%, 40%, 50%, 
60%, 70%, 80%, 90%, 95%, « < 30. 



DISTRIBUTIONS RELATED TO COMPARISON OF TWO 
MEANS AND TWO REGRESSION COEFFICIENTS 

By Uttam Chand 1 
University of North Carolina 

Summary. We consider here the relative merits of different statistics avail¬ 
able for testing two means or two regression coefficients id. relation to one-sided 
(asymmetric) and two-sided (symmetric) alternatives in case of unequal popula¬ 
tion variances In so far as the Behrens-Fisher statistic is concerned we confine 
ourselves to the consideration of the behavior of it? probability of Type I error 
m repeated sampling from populations with a fixed value of the unknown ratio 
of variances. In connection with the tests between two means, the piesent 
study takes its point of departure from the existing tests and investigates the 
question of utilizing an approximately determinate knowledge about the un¬ 
known ratio of variances In connection with the comparison of two regression 
coefficients and also of two linear regression functions, we consider the effect of 
two concomitant sources of variation, viz, the unknown ratio of residual variances 
and the ratio of the sums of squares of the fixed variates, on the probability of 
Type I and Type II errors of certain well known statistics. 


1. Introduction. Consider two independent samples x x ■ • £„ 1+ i and si Xn 2 +i 
drawn from two normal populations with means mi and m 2 , variances <?l and <y \. 
Let K = <n/<r \. If K is known and mi = m 2 , the quantity 


S(x - x? + KS'(x' - x'f / 1 . 1 Vf 

ni + n 2 yh + 1 K{n 2 + l)/_ 


(<1 is Fisher's t) is distributed according to "Student’s” distribution with m + n 2 
d.o.f. 2 and for the "Student’s” hypothesis Ho:mi = m 2 provides a uniformly most 
powerful test against an asymmetric alternative Hi'.mi > (or <)m 2 and a 
type Bi test against a symmetric alternative H 2 :mi m 2 . If K is unknown 
certain approximate and exact tests have been suggested from time to time to 
meet this situation. 

Welch [1], [2] using an approximation to the distribution of h was the first 
to point out that if K is unknown and we assume it to be equal to unity, then 
the probability of Type I error of the 4-test is subject to large variations as K 
varies from 0 to <*>. He also pointed out that the statistic 


v 


r s(x - a) 2 gw - sot * 

X X _ni(ni + 1 ) 712(712 4 - 1 ) _ 


1 Now Assistant Professor of Mathematical Statistics at Boston Umvei sity, 
! Degrees of freedom. 


507 



608 


VTTAM CIHANI) 


which does not have "Student's” distribution for K ~ 1, has the advantage 
that its probability of Type 1 error is subject to less variation with respect to K. 
His approximate results were later confirmed by Hsu [8] who obtained the 
distribution of quantities and it s ( = r s ) and also showed that these tests 

are unbiased in the sense of Neyman and Pearson. Hsu concluded on the basis 
of his investigations that when the sample sizes are equal and not very small, 
we may safely use tq( = ih) as if K were unity. This also had been pointed out 
by Welch. 

If on the basis of past experience some approximate value k of K were available, 
one would like to know if such a choice in some rough neighborhood of K would 
in anyway improve the claim of tki — ix. for K — k ) for the hypothesis m, y = m 2 . 

The distribution of this generic quantity iJ = 4 for A: = 1; =v for k = 

\ n 2 (n 2 + 1) 

will be obtained in Section 2.1. It will be shown that variation in the probability 
of Type I error of 4 with respect to K for any k except when 6 = v, is essentially 
similar in character to that of t{ [3] and is very sensitive in a neighborhood 
of K in which one would very often he interested (Section 2.4). This is also true 
of the behavior of the power function of 4 with respect to K. Consequently a 4 
type of statistic will be unsuitable in general for utilizing an approximately 
determinate knowledge of K. 

It is not possible to infer directly from Hsu’s work on the relative merits of 4 
and v in relation to asymmetric aspects of “Student’s” hypothesis. His basic 
conclusions as regards unbiasedness and the nature of variations in Type I 
error in the symmetric case also hold for the asymmetric, case except that the 
Type 1 variations in 4 and u are less for asymmetric than for symmetric com¬ 
parisons (Section 2.5 and Table II). Furthermore it appears (Section 2.6 and 
Table III) that with respect to the variations of K both the asymmetric and 
symmetric power functions df 4 are likely to be more sensitive than those of v. 
Since for equal d.o.f. both the asymmetric probability of Type I error and 
power function are insensitive to the vagaries of the ‘nuisance’ parameter K, 
there is an a fortiori reason for using t>(=4) as if If were unity. 

Schefh: [4] considered the statistic 


S = ($ 


/nj+i 

S _ 20 \ S 


■p 1 (u,- — 'll ) 2 \ 1 


(ni < ni), 
\nt + 1/ 


ttdnj + 1) 

(equivalent to paired difference t when rii = ni) where xh — x 

and where it is assumed that the variates in each sample have been randomized. 
This is essentially a “Student’s” t comparison based on n 2 d.o.f, and as shown by 
Scheff6 it is impossible to get a suitable statistic with the ^-distribution with 
more than Wj d.6.f, The statistic v has the i-distribution only when K = 03 («i 

d.o.f.), K = 0(712 d,o.f.) and K = ^ (n x -f m d.o.f.). For any given 

7i2\ti2 + 1 ) 

ni, 74 , K and P we can solve P = P(v > 4> | Hi) for 4 and thus indirectly obtain 



COMPARISON OP TWO MEANS 


509 


from the tabulated values of the /-distribution the number of ‘effective’ d.o.f. 
which will thus adjust v to any preassigned level of significance. We try to 
show in Section 2.6 that in situations where some approximate knowledge of K 
is available, the statistic v seems to have a decided advantage over any other 
statistic having the /-distribution. We show by actual computations that Welch’s 
formula [2] provides a conservative estimate for the effective d o.f. in the light of 
which this comparison will be considered, 

The Behrens-Fisher fiducial test employing the statistic d [5], [6], which has 
essentially the same structural form as v, has given rise to much controversy 
essentially because of inconsistencies arising from tests of significance based 
on the fiducial distribution of unknown parameters. We attempt to show in 
Section 2 7 that the fiducial test in general is ‘conservative’ in detecting significant 
results in repeated sampling from populations with a fixed value of the unknown 
ratio of variances. 

In the case of comparison of two regression coefficients when the residual 
variances are unequal, we are faced with a similar type of problem. Consider 
two samples y „ | x ^ and y, \ x, (p = 1 , • • • , n x + 1 , v = 1 , • , rh + 1 ), where 

Xp and x{ are fixed and and y' v are normally and independently distributed 
according to N(a i + j3i(x„ — x), al) and N(a 2 + — x'), a\) respectively. 

For the hypothesis ft = ft when the alternatives do not specify anything except 
ft > ft or <ft , or ft 5 ^ ft we shall consider the merits of statistics /* and v* 
which correspond to statistics Zi and v for the two means. While the statistic /* 
is sensitive to the variation of both K = a\/u\ and w, the ratio of the sums of 
squares of the fixed variates, the statistic v* is insensitive to the variation of 
both. Barankin 3 has extended Scheffd’s test to the comparison of two regression 
coefficients under the above assumptions. The statistic proposed by Baranlun 
has Student’s distribution with n 2 — 1 d o f. (th < nf) and provides the only 
exact unbiased test so far known While ScheffCs test for the comparison of 
two means and Barankin’s test for the comparison of two regression coefficients 
should not be used when K is known and were never intended to utilize any 
available approximate information about K, the question of investigating into 
the possibility of using v* in the latter situation is not without interest (Section 3). 
In Section 4 we consider the hypothesis of equality of two linear regression 
functions viz., H 0 : ai = <x 2 , ft = ft when the alternatives do not specify anything 
except ^ a 2 or ft ^ ft . 

In studying the behavior of the power function and the probability of Type I 
error of certain statistics under discussion we have made full use of Hsu’s method 
and consequently only essential details have been given here 

2. Hypothesis of equality of two means when variances are unequal 

2.1. The distribution of tk for any values of n\ and n 2 . Consider the test function 
tk(~t K for K = /c; Section 1) where k is some inexact value of K This can be 

3 E W. Bar ank in, "Extension of the Romanovsky-Bartlett-Scheffe test” Proc. Berkeley 
Symposium on Math Stat. and Prob , University of California Press, 1949, pp 433—449. 



510 


TJTTAM CHAND 


put in the form of t k = ({ + 5) (bxl + cxl) 5 where £ i.s ,¥(0, 1) and the x 2 ’s 
have independent x 2 -distribution with rij and n 2 d.of., and where 

6 = («ii — m0 ( —tT7 + - *0 ) , 

\ni +1 n 2 + 1/ 

b = ( K/k) («! + n 2 )~’[fc(n 2 + 1) 4- % + 1] [K(th 4-l)4-%4- 1]~\ 
c = (% 4* ri 2 ) *[fc(n 2 4- 1) 4- ni 4- 1] [Z(n 2 4~ 1) + 4- 1] *, 

5/c = Z/fc. 


In wliat follows we shall omit the subscript k from 4 ■ The joint probability 
element of f, xi and xi! is given by 

dF(( h xi,xl) = K2T) _, [r(ni/2)r(«j/2)]' 1 c _i<t ! +x ? +x J ) (xi/2)" l/2_1 

(xi/2 ) n,/8_1 Ad(x?) d(xl). 


We transform t,o new variables f, r and 6 by the relations 

£ 4- 5 = i(bxi 4- cx?)\ 

bxl = r cos 2 0 (0 < 8 < tt/2), 

cxl = r 2 sin 2 0 (— oo <?*< 4-°°)> 


and integrate out r. To integrate out 8 we put z = sin 8 8 if b < c and z = cos 8 0 
if 6 > c. This reduces the integration w.r.t. 6 to a scries of hypergeometric 
integrals. We finally have the following form for the frequency function of t k ; 


git) 

( 2 . 1 . 1 ) 



(5tY(2bY l2 r(^±^±^+- 


0 


h i(l 4- bn 


2-j rh 4~ n 2 4- t 4" 1 


p frii + rii + r l n 2 ni 4- m 1 — b/c\ 
\ 2 ’ 2 ’ 2 ’ 1 4- bi*)’ 


where F denotes the hypergeometric function. As a check if we put b = c = 
(ni 4- n 2 ) -1 , we get the frequency function of non-central t for ni 4- rc 2 d.oi. For 
the case b > c we have only to interchange b with c and n x with n 2 . 

The null distribution of t k (5 = 0) is an even function of t k , consequently the 
forms of the single and two-equal-tailed probability of Type I error will be the 

same except for the constant £. If we let ft (5, Z, k, n x , n 2 ) = / g(t) dt denote the 

“ 1 0 

single upper tail power function of t K , from (2.1.1) we obtain 


(h(S, K, k, th, n.) = h~ sV \K/k) nili E E 

0 r«0 


(5 2 /2) r/2 r(^ + h 


(f) h i 


)M)\ 


*0 


ih + n 2 , r + 

~l ft -FT 


-O' 


( 2 . 1 . 2 ) 



COMPARISON OF TWO MEANS 


511 


where x a (1 + U) and I Xo (p, q) is the incomplete beta ratio. To obtain the 
two equal tailed power function /3 2 (5, K, k, 74 , n 2 ) we need only change r into 2 r 
and omit the factor 7 . 

2.2. Distribution of hfor even values of % and n 2 . (For notation refer to Section 
2.1). When ni and are even, the method of characteristic functions yields a 
single infinite sciics for the distribution of tic, and when 5 = 0 this series reduces 

to g* terms, lhe characteristic function of X = bxi ~t~ cy 2 is given by 

= (1 — 2 bir) ni/a (1 — 2c it) ’ ,s/2 . To obtain the form of the frequency func- 
tion of X we make use of the inversion theorem and integrate round a standard 
contour in the lower half of the complex plane The distribution of t k can then be 
obtained from the joint probability element of £ and X. We obtain the following 
form for the single tailed power function of 4 : 


(2 2 . 1 ) 



(K > 7c) 


where x. a has been defined in the previous section and x' 0 = (1 + c7o) -1 . 

2.3. Unbiasedness of a test based on tk . Since the single and two tailed forms 
of the power function of 4 (Section 2.1) are essentially the same functions of the 


*\Q SO 

standardised ‘distance’ 5, following Hsu [3] we can show that ~ > 0 and ~ > 0 

do oo 


for any fixed K and 7c; and consequently such a generic type of statistic provides 
an unbiased test both against symmetric and asymmetric alternatives. 

2.4. Variations in the power function and the probability of Type I error of tk. 
For the case 7c = 1, Hsu [3] has already shown'that the probability of Type I 
error of the statistic ti is subject to large variations w.r.t. K. He also pointed 
out that the behavior of the derivative of its power function w r.t. K for fixed 5 
was similar to that of its probability of Type I error w.r.t. K. We shall presently 
see that 4 also shares this property with t\ . 

In the first place one would like to know if any choice of 7c in a small neighbor¬ 
hood of K would stabilize the variations in the Type I error of 4 to such an 
extent as to make it approximately insensitive to that difference between k and 




512 


UTTAM CHANT) 


K. With this end in view we shall examine the nature of variations in the proba¬ 
bility of Type I error of 4 w.r.t K for any fixed k. 

From (2.1.2) by putting 5 = Owe obtain 


(2.4.1) 


p - PH, > w = mm 1 "” E t (h + a) a - K/kt 

(r (|) r(i + i))"‘ j„ * + h, j). 

We now differentiate (2.4.1) and after simplification obtain 

~ < Gi(K/k)~ l [tP,(iii + 1) - m(«i + 1 )/k][K(r h + 1) + n t + l]" 1 (K < k ). 

Similarly 
dP 


^ > C 2 [n 2 (tt 2 + 1) — ni(ni + \)/k][K{nx + 1) + n 2 -)- 1] 1 


C K > k), 


where Ci and C t are certain positive constants independent of K and k. 


If k = 


rti(ni + l) 
n 2 (nj + 1) 


we have 


dP 

dK 


§0 


for K §: k. 

This is the case when 4 is identical with the statistic v defined in Section 1 
and the probability of Type I error curve expressing P as a function of K has a 
minimum at this point: for n x < n 2 the minimum occurs for a value of K < 1 
and vice versa. And since v is known to be insensitive to the variation of K [3], 
therefore tk is insensitive to the variation of K for this value of Jc. 

For any other assumed value of k the curve either starts decreasing from 
K = a o or from K = 0 to the point where K = k depending upon the values of 
«i and n 2 . In each case the ordinate of the curve continues to decrease for some 
distance; it may decrease to a minimum and then start increasing or else decrease 
indefinitely. For fixed S the power function of 4 also has a minimum when 

K = fc = ^ ^ I and for any other 1c the behavior of its power function is 

similar to that of its probability of Type I error. For the case 7c — 1 numerical 
values of the single and two-thiled values of the probability of Type I error 
and power function for different values of % and n 2 and K are given in Tables II 
and III (Section 2.5). 

In certain practical situations it may happen for example that on the basis 
of past experience one can determine k so that 2 = \k — K\< 2. The question 
arises: how much is 4 sensitive to such a neighborhood for any k, K, n k and n 2 ? 
That it is hard to provide a practically useful answer to this question will be 



COMPARISON OF TWO MEANS 


513 


apparent from the nature of the distribution of t h , which depends both on 
K and k and not merely on their ratio The following Table I will indicate how 
in such a small neighborhood P(4 > 4) can be in serious error in two different 
directions. 

2.5. Statistics ty and v in relation to asymmetric and symmetric aspects of 
11 Student s hypothesis. Statistics 4 and v are special cases of 4 and the behavior 
of their probability of Type I error and power function has already been discussed 
(Sections 2.3 and 2.4). In this section we compare the single-tailed and two 
tailed values of the probability of Type I error and power function in the light 
of several particular examples. In all these calculations e.g. in P(f > 4) and 

TABLE I 


Variations in P{h, > to) with, respect to k for fixed K 
C K = 6; m = 2, n s = 4, t„ = 2 447) 


k « 

l 

2 

3 

4 

5 

6 

7 


,1129 

.0936 

,0749 

0607 

.05 

.0418 

.0355 


TABLE II 

Variations m the symmetric and asymmetric probability of Type 1 error of v and ty in 
relation to the unknown ratio of variances K 


K 

0 

.125 

3 

1 

2 

4 

8 

16 


% point of 
tabulated fi 

«l «■ m - 3 

.074 

0033 

.0604 

.05 

0504 

0568 

0633 

0691 

.074 

single tailed 5% 

t> - t 

1 002 

0881 

0525 

.05 

0525 

.0597 

0981 

.0770 

0C2 

two-tailed 6% 

U 

.034 

.0181 

0110 

01 

0110 

0138 

.0181 

0227 

034 

two-tailed 1% 

rtj « 4, na 18 

.0112 

0120| 

.0142 

0106 

0227 

0266 

0293 

0305 

0324 

single tailed 1% 

i) 11 

012 

■OlOlt 

0187 

.0233 

.0291 

0360 

.0407 

0433 

.0405 

two-tailed 1% 

ni "•= 8, m ■» 4 

,076 

OOR7 

0698 

.0543 

0541 

.06117 

0521 

0531 

056 

single tailed 6% 

Til =» 4, 711 *= 18 

00011 

00043 

00310 

01 

0221 

.0483 

0793 

0804 

133 

single tailed 1% 

l 

00007 

,00031 

.00244 

01 

,0310 

0592 

.1169 

1544 

222 

two-tailed 1% 

ni — 8, nj •» 4 

,1342 

.1050 

0710 

05 

0308 

.0287 

.0246 

.0224 

.0204 

Single tailed 6% 


t P « 01 when K ~ ,074 
X P 05 when K = 3 0 


jP(| *1 > <o), ici refers to the single and 4 to the two tailed values of Fisher’s t 
for the appropriate number of d.o.f. Tables II and III give the approximate 
values for the probability of Type I error and the power function respectively 
both against symmetric and asymmetric alternatives. 

For equal sample sizes (v — 4) the Type I error and power function curves, 
representing probability of Type I error and power function as a function of K, 
have a minimum when K is unity and a maximum occurs when K is either zero or 
infinity. Maximum values of the probability of Type I error for several equal 
sample sizes are given in Table IV. It appears that for equal sample sizes the 
probability of Type I error and the power function are likely to be insensitive 
to the variation of K. We also notice m this connection that while the single 



514 


UTTAM CIIiYNI) 


tailed values of the probability of Type I error are, less than those of the two 
tailed values, the values of the two tailed power function for 5=1 are less 
than the corresponding single tailed values. This appears to he true also for the 
statistic v when ^ n 2 . For unequal sample sizes also the probability of Type I 
error and the power function of k are likely to be more sensitive to the variation 
of K than those of v . It may he pointed out in the sequel that while it is recognized 
that for unequal d.o.f. a fair comparison of the probability of Typo I error and 
the power function of v with those of t\ ought to adjust v and k to the same level 
of significance, namely the same maximum (for all K) probability of Type I 
error, this would not alter our conclusions about the sensitive nature of U . 

TABLE IIP 

Variations m the asymmetric and symmetric power function of ti and v con espondmg to the 



6% point of tabulated h(S = 1) 



K » 

a 5 1 

2 

*0 


71 1 = ni = 3 

189 .111 .137 

141 

.180 

symmetric 

11 =» t\ 

269 .229 225 5 

.229 

.209 

asymmetric 

n\ = 8, 7ii = 4 

351 .202 .152 

.112 

.003 

symmetric 

t, 

428 .204 ,242‘ 

.194 

.122 

asymmetric 

»ij = 8,7U => 4 

208 .100 .162 

,156t 

.168 

symmetric 

V 

.286 250 . 2-17 

.244J 

.255 

asymmetric 

t minimum of .152 is reached for K “ 3.6. 




t minimum of .242 is reached for K •=» 3.6. 





TABLE IV 




Maximum probability of Type I error of v(= 

(i) for equal degrees of freedom 


Symmetric 



Asymmetric 

ni -f 1 «• f»a + 1 

5% 1% 


5% 

1 % 

7 

.0721 .0224 


B 

.0182 

9 

■ 


Hi 

.0162 

n 

1 


.0576 

.0160 

15 

.0598 .0152 


.0555 

.0136 

21 

.0669 .0137 


.0538 

.0125 


2.6. Statistic v, Schcffi's test and paired difference t. If K is known, v or Se.hefte’s 
statistic S should not bo used. If K is unknown, S is an ingenious device for 
getting a Student’s t with min(?ii, 712 ) d.o.f. and provides the only exact un¬ 
biased teat so far known. In such a situation since nothing is known about K, a 
fair comparison of the power function of 5 with v ought to adjust v to the same 
maximum probability of Type I error for all K (maximum will occur for K = 0 
or K = «j according as m n 2 ); and at such a maximum significance level it is 

' The author acknowledges with pleasure the help given in the prepmation of this table 
by Miss Elizabeth Shuhany of the Statistical Laboratory, Boston University 
! Values taken from [7) 











COMPARISON OF TWO MEANS 


515 


recognized that v cannot be uniformly better than S For samples of equal 
size n the use of the paired difference t with n — 1 d.o.f (equivalent to S when 
m = n% ; Section 1) provides a suitable test for two reasons: (i) it is exact and 
(ii) as shown by Walsh [8] has a high power efficiency. 

If any approximate a prion information about K is available, v appears to 
be the only suitable statistic to utilize such information While S was not intended 
to cope with such a situation, t k (Section 2 4) has been shown to be unsuitable. 
Since V is insensitive to the variation of K, we shall not be far wrong in using 
‘effective’ d.o.f. based upon an assumed value k of K satisfying some such relation 
as ^ < [ fc — K | < 2. The effective d o.f. of v as given by Welch [1] and as given 
by p = P(v > to) or by P = P(| v | > t' a ) for fixed P (listed m Table V as calcu¬ 
lated d.o f ) are identical for K = 0,1, an «> (n L = n 2 ) and (ii) K = 0, , 

and co (rii ^ n 2 ). For other values of K it appears from Table V that Welch’s 
formula errs on the conservative side. The effective number of d o f. vary between 
ni + n 2 and min(?ii, n 2 ) (of. d o.f. for S). Consequently in the absence of any 


Sample Size 


iii -f l m m + l “ 
tu + l ^ u* + t *■ 
ni + 1 ■“ D, ni + 1 


TABLE V 

Adjusted power function of v in the light of ‘ effective’ degrees of freedom 

uc.„ I Adjuited asymmetric power function of r | Effective d.o t, 

lor probability ol Type I error of .05 


K ~ 

6 - 

0 ,125 

1 

4 

OO 

K = 0 

5 => 
125 

2 

4 

| Calculated 

- i*=° 125 4 

» 

174 

204 

201 

174 

384 

478 

470 

.384 1 2 3 30 

3 36 

2 

225 

. 23G 

23 fi 

225 

550 

.5*1 

5K1 

650 ' 6 0 H 

0 H 

6 

| 210 

227 

242 

233 

504 

.650 

504 

572 ’ 4 0 50 

11 90 

F 


Welch’s formula 


2 2 04 2 94 2 
8 8 82 8 82 6 
4 6 14 11 90 8 


best unbiased test and in the light of any approximate information about K it 
would appear that v has a decided advantage over any other statistic. 

2.7. The Behrens-Fisher test in repeated, sampling. Consider the statistic 

(/ = (£- x') (si + si)'* = sin 6 ~ ^ 008 6 > 

where a! and si are the unbiased estimates of the variances of the means £ and x' 
respectively, k and k have independent “Student’s’’ distributions with nr and n 2 
d.oi. respectively, and tan 5 = «/«. On the basis of the ‘‘fiducial’ 

<j\ and <4 Fisher [ 6 ] regards d as a “mixture” of k and k with constant coefficient^ 
It is to be noted that if * and s 2 are fixed in the classical sense k and k have 
independent normal conditional distributions with zero means' .”d is 
c\/s fand ol/sl respectively; and if Sr and s 2 vary in their own distribution d 

identical with v (Section 1). _ ... , = & ,* »* 

Neyman [9] considered the integral of the joint probability law of *, * , «i, «» 

over the set — = S < h sm 6 - k cos 6 where the quantity on the right also 

depends upo^sfS, and is the quantity d tabulated by Sukhatme 110], UU 


UTTS.M Oil AND 


510 


Noyman showed in particular that if pairs of normal populations with different K 
are sampled (m + 1 = 13, n 2 + 1 = 7), then the relative frequency of correct 
statements about m x — m- bused on the 5% points of d will not lie equal to the 
expected .95 and will vary with K. 

We consider here the following similar typo of question: what Is the nature of 
discrepancies that will arise in the probability of Type I error by the repeated 
use of the Bchrens-Fisher test in sampling from two normal populations? We 
observe that since d and v have the same structural form, the appropriate 
probability of Type I error in such a situation will be given by the probability 
integral of v (Sections 2.2 and 2,5). 


TABLE VI 


Minimum and maximum + tallies of P[ | if | > do) for different values of K 


K 

1 0 

05 

1 

2 

o» 

do 

tli + 1 >» fli + 1 “ " 

i .05 

0321 

0307 

0321 

.05 

2,447 


! .0508 

.0329 

.0313 

.0329 

.0508 

2.435 

tti + 1 “ na +• 1 •= 9 

! .05 

.0362 

03-16 

.0362 

.05 

2.306 


' .0512 

.0367 

.0358 

.0367 

.0512 

2 292 

a, + 1 ** us + 1 *■ 13 

! .05 

.0-105 

03116 

.0405 

.05 

2.179 


I .0507 

.0434 

.0403 

.0434 

.0507 

2.170 

nt + 1 *=* 7, ni -f 1 = I 

1, .0307 

.0281 

.0317 

.0393 

.06 

2.447 


.05 

.0460 

.0516 

.0697 

.0720 

2.170 

1=3 72.2 m 

i .05 

.05 

.05 

.05 

.05 

1.960 


t maximum values have been indicated in bold type 


We observe that P(\ v | > x ) is a monotone decreasing function of x for any 

dP > 

fixed K, Hi arid n 2 . Furthermore for fixed x, tti and n 2 wc have ^ 0 for (i) 

a/t 

K ~ 1, ?h = n 2 and (ii) K | j n x n*. Table YI gives the minimum 

and maximum values of P(\ v | > do) for different values of K where d 0 corre¬ 
sponds to the highest and lowest value of tabulated d. It appears that for equal 
sample sizes the minimum probability of Type I error is less than .05 and will 
converge to ,05 when K is either infinity or zero. The maximum probability of 
Type I error converges to a value slightly higher than .05, This probability also 
converges to .05 with increasing size of equal samples for every K, For unequal 
sample sizes e.g. n\ < n 2 , the minimum values converge to .05 when K = °° and 
if ni > n 2 , this convergence takes place when K = 0. The maximum values 
are both greater and less than ,05. 


3. Hypothesis of equality of regression coefficients when residual variances 
are unequal. 

3.1. Unbiasedness of tests based on statistics t* and v*. Consider 


t* = Oh. - bf) 


' S(y - 7) 2 + S'(y’ - Yf ( l_ _l_\T l 
n x + n 2 - 2 \Mi + Mi) _ 




COMPABISON OF TWO MEANS 


517 


and 


= (61 - 62) 


' S(y - F ) 2 J S'(j/ - y ') 2 
- 1) + M 2 (n 2 - 1) 


\~i 


where h and b 2 are regression coefficients calculated from independent samples; Y 
and Y' are the sample regression functions; M x = S(x - xfi and M 2 = S'{x'-x'f 
Under the assumptions of Section 1 these two quantities are distributed as 

= (£ + A) (miXi.-U-I + Pixl. 
v * = (£ + A) (XxXr.ft !—1 + X 2 x2,nj-i) _i , 


respectively, where £ is N{0 , 1) and the x 2 ’s have independent ^-distribution 
with d.o.f. indicated in the second subscripts, and where 

MjMt = w, 


Mi = K{w + 1) {K + w) 1 (m + n 2 - 2) -1 , 
M 2 = (w + 1) ( K + t«) _1 (ni -f n 2 — 2) -1 , 



A = (di — ft) 


1 2 2\-l 

Vl 1 <T2 \ 

Mi mJ ’ 


Xi = K(K + wT l ( ni - l)~\ 
X 2 = w{K + w ) 1 (n 2 — 1) 1 , 


^ = (KM 


n t — 1 
711 — r 


Consequently these two statistics have the same basic distribution as obtained 
previously for 4 (Section 2.1) and their power functions are monotone increasing 
functions of the standardized ‘distance’ A for fixed values of K, w, ni and n 2 . 
While the statistic t* has “Student’s" distribution with ni -f n 2 - 2 d 0 f. 
whenever K = 1, the statistic v* is only so distributed when K = w(ni — 1) 
(n 2 - I)" 1 . 

3.2. Variations in the probability of Type I error and power function of t* and v*. 
The behavior of the partial derivatives of the probability of Type I error and 
the power function of t* and v* w.r t. K and also m relation to w is essentially 
the same. For purposes of illustration we shall only consider the behavior of the 
probability of Type I error. We shall presently see that for the hypothesis 
ft = ft (cf. “Student’s” hypothesis mi = m 2 ) while t* is sensitive to the variation 
of K and w, v* is insensitive to both. 

3.2,1. Variations w.r.t. K for fixed w Remembering that the X 2 ’s in the de¬ 
nominator of t* have respectively n\ — T and n 2 — 1 d.o.f., we can write down 
P(t* > to) from the corresponding form for 4 (Section 2.3). After simplification 
we obtain 


(3.2.1.1) 


< Li[(n a - 1) - w{ni - 1)] (K + w) X /K {K < 1) 



518 


ITTAM C'JUND 


whore z 0 = (1 + Mi to)" 1 - If wo make use* of the. relation P(n, , n 2 , Mi , M 2 , K) = 
P(n «, ni, Mi, Mi , IC l ) in (8.2.1,1) wo obtain 


(3.2.1.2) 


> UK -f w) 1 t( ni - 1) - wU - 1)] 


(K > 1), 


where Li and Li are certain positive constants independent of Mi , M 2 and K. 
Similarly for the statistic v* wo have 


(3.2.1.3) 


< DriUr'Kiii - 1) - «.(«, - 1 W(IC + to) (U < ]) 


(3 2.1.4) 


'aK > Dl ^ 7h ~ ^ ~ w ( ni ~ 


(U > 1 ), 


where D i and /)* arc certain positive constants independent of K, Mi and M t and 

where 4> — ^ ~ We nolicc that if (i) ni = n 2 and to = 1 or (ii) to = 

we have t* = v* and both from (3.2.1.1), (3.2.1.2) and from (3.2.1.3), (3.2,1,4) 

we obtain § 0 for K 1. In the case (i) the maximum probability of Type I 

error occurs at K = °o and K = 0. In case (ii) the maximum will sometimes 
occur for K ~ 0 and sometimes for K «, depending on the relative magnitude 
of n! and n 2 . 

For other situations l* and v* exhibit a typo of behavior essentially similar 
to that of ti and y (Section 2.5) Wo notice that the ( P, K ) curve for v* has a 

minimum when K - If ni = n 2 , the minimum point is given by 

712—1 

K — w Therefore noth an approximate knowledge of K, a useful practical hint 
to remember is to so adjust Mi and M% as to have w approximately equal to K. 
If 7ii ^ n 2 any information about a\ being greater or less than a\ can be used 
with decided advantage to adjust Mi, Jf 2 , 7ii and ?i 2 so as to reduce considerably 
the lisk of the first kind and thus work in a region of the ( P, K ) curve where 
there is not much danger of bias m the probability of Type I error. This will 
also reduce the fluctuations of the power function of v about its minimum which 

also occurs for K = 

712 — 1 

3,2.2. Variations in relation to w for fixed K. The partial derivative of P(t* > to) 
with respect to w is given by 

- i(l - K)TC'~ l, \K + ni)- 1 E(l- K) h 


(3.2.2.1) 


r (~~2 —’ + h ) z ° n ' +n, ~* )IHh ( l 


+ h, \ 


(K < 1). 



COMPARISON OP TWO MEANS 


519 


Therefore 


for K < 1. 
Similarly 


~>0 

ow 


~ < 0 
dw 

for K > 1, 

To justify the differentiation of the series in (3.2 2.1) we make use of the lesult 


L. 


ni + rii — 2 
2 


+ h, | 


Ri + -n 2 — 2 


+ h + 1 , 5 

f 

(1 - So)* 


(*1+712-2) n-w> 
20 


'ni + n 2 — 2 


+ ^ B ^ 


Til “h ^2 — 2 


+ M 


and consequently the seiies under consideration may be shown to be dominated 
by an absolutely and uniformly convergent series for 0 < K < 1 
For the statistic v* consider 

P(v* > to) = m) int ~ l)l2 E (i - K<t,) h rfe-T-I + h\ 

(3.2.2 2) ^ 

• [r(/ t + Dr(^)] 1 l n ( ?* + ^ ~ j ^ (X * < l) 

where y% = (1 -f Xi$) _1 . We notice from (3.2.2.2) and from the form of quantities 
Xj and X 2 (Section 3.1) that P(y* > to) depends on K and w only through the 
product of K and l/w. Consequently variations of P w.r.t. 1/w for fixed K 
are the same as those of P w.r t. K for fixed w. Thus we may directly infer that 
P(v* > to) will be insensitive to the variations of w. The following Table VII 
will illustrate the nature of variations in the probability of Type I error m the 
tests based on t* and v* in relation to w. 


TABLE VII 


Variations in the pi obabihty of Type I error of t* and v* 
{K = 2; m = n, = 7; t o = 1.782) _ 


1# 

• 

.25 

5 

1 

2 

60 

P(i* > i.) 

0259 

.0358 

.0427 

0512 

.0594 

.0866 

P(.v* > «,) 

0625 

0570 

.0539 

0512 

.05 

,0625 


It would appear that on the analogy of statistics t\ and v for the comparison of 
two means one could guess about the sensitive nature of l* in relation to the 



520 


tJTTAM OHAND 


variations of (hr 'nuisance' parameter K. r i‘lie, additional drawback in t* which 
stems from the monotone nature of its variations with respect to w is a further 

warning against the use of a t* typo statistic for the hypothesis ft = ft when 
2 , 2 
tTl W 0% • 


4. Hypothesis of equality of two linear regression functions when variances 
are unequal. 

4.1. The statistic Z. (For notation refer to Sections 2.1 and 3.1). Consider the 
model given in Sections 1 and 3 for the comparison of two regression coefficients, 
If the variances are equal, the statistic based on the likelihood ratio criterion 
for the composite hypothesis and ft = ft is given by 

r , _ (ft ~ fh) 2 ( n i + 1) (112 + l)(«i +n5 + 2) 1 + (ft — + M-i) 1 

s\y - yy + s\ y ' - fy 

The quantity Z is distributed like the ratio of two independently distributed x 2, s 
and consequently its distribution is precisely determined under the hypothesis. 
If a\ + a\ , 7i can lie put in the form of 

Z = (fllXl.l + 02X2.l) (•KxL l ~l + Xl.nj-l) \ 

which is now distributed as the ratio of 'mixtures’ of independently distributed 
x 2, s with d.o.f, indicated in the second subscripts and where 

ai =■ [ni + 1 + K(?h + 1 )] (nj + 712 + 2 ) \ 

aj = (K + w) (1 + w)~\ 


In the non-null case when ai a*, ft ft the numerator of Z is a mixture of 
non-central squares. If we let / 3(K, w, 6, A, n x , n 2 ) denote the power function 
of Z , following Robbins and Pittman [12] we obtain 


P{K, w, S, A, ni, rii) 

(4.1.1) 


222222 o,dh'pJt ■■■ i - ’ — + h — l ,k +i+ 1\ 

}~0 A —0 *-0 \ 2 / 


> 1, w < 


Til + l\ 

712 + 1 /’ 


Cj 


T( j +1) 

r(*)ji 


(1 - ai/ai)’, 


Pk 



= e- iD \%DY/k\ (D 2 = 8* + A 2 ), 


!*—(! + Z B / ai) 1 . 


wliere 




COMPARISON OP TWO MEANS 


521 


4 2. Variations in the probability of Type I error and the power function of Z 
Corresponding to (4 1.1) we obtain the expression for the probability of Type I 
error P{Z > Z 0 ) by putting D = 0 and k = 0. It has not been possible to establish 
any definite law concerning the behavior of the probability of Type I error 
and the power function w.r.t. the 'nuisance’ parameter K. However we shall 
presently establish their monotone dependence on the variable parameter w. 

We differentiate PifZ > Zf) with respect to w and after simplification obtain 


or , iK _ _ !■)' -%{i - ? 

dw jT(i) |_2\ at/ a 2 \ a 2 

r + "« i;n V 1)(fll/aa)i - d- r( ’ + 3/ ^ 

• Ht” + h ~ l.J + V “ (K + fir® 

^tL+Z+h - l,j + l) - Is(^±^ + h- 1,3 + 2) 


< 0 


for K > 1 w < W l . Similarly by utilizing an appropriate expression for 

’ Hi +1 

P(Z > Zo) for X > 1, w > we can show that — < 0. For the case 

K < 1 it can be shown that P{Z > Z 0 ) is a monotone increasing function of w. 
This is also true of the dependence of the power function of Z on w. 

4 3. Unbiasedness of Z. We differentiate (4.11) w.r.t. 6 and A and after 

simplification obtain^ > 0, > 0. Thus the power function of Zhas a relative 


minimum at 6 = 0, A = 0 n 

The author is greatly indebted to Professors Harold Hotelling and William U 
Madow for guidance m this research and to the referees for many useful sug¬ 
gestions and criticisms. 


REFERENCES 

[11 B. L. Welch, "The significance of the difference between two meanswhen the popula¬ 
te variances are unequal”, Biomelnka, Vol 29 (1938), pp 35(4-3 1. 

[2[ M G. Kendall, The Advanced Theory of Statistics, Vol 2, . 1 •> > 

13] P, l!(^lByf'’coniribution to the theory of “Student|s''Ptestasapplied to the problem 

Anna, o, Eugenics, 

.«P—"• “ 

Sac , Supp., Vol 2 (1936), pp 107-180 


R 


[7] J 



r.22 


TOT AM CII AND 


|8| J. K. Wai.sh, “On I hr pmier efficiency of a t-tesi homed by pairing sample values” 
Annals of lilnlh. Mat , Vol .‘18 pp. fill] (HU 

[9| J, Neyman, "Fiducial argument and the theory of emihdcnce intervals”, Ihomeirika 
Vol. 32 )tp. 128, IT 

[10| V. V. StlKHATME, “(hi Fisher ami Behrens' test of significance for the difference in 
means of two normal samples”, Snnklu/d, Vol -1 (1038), pp. 30-18 
(11] It. A Fjhiibr and F Yates, Statistical Tables, Olivei and Boyd, 19-13. 

[12| II. IIobmns and IS. J. (!. Pitman, “Application of the method of mixtures to quadratic 
forms m normal van,lies", Annals of Math. Stnl., Vol 20 flll-lO), pp f>52--560 



THE EXTREMAL QUOTIENT 

By E. J. Gttmbel and R. D. Keeney 
New York City and Metropolitan Life Insurance Company 

Summary. The extremal quotient is defined as the ratio of the largest to the 
absolute value of the smallest observation. Its analytical properties for sym¬ 
metrical, continuous and unlimited distributions are obtained from a study of 
the auto-quotient defined as the ratio of Uvo non-negative variates noth identi¬ 
cal distributions. The relation of the two statistics is established by proving 
that, for sufficiently large samples from an initial distribution with median zero, 
the largest (or smallest) value may be assumed to be positive (or negative) 
and that the extremes are independent. It follows that the distribution and the 
probability of the extremal quotient possess certain symmetries, and that its 
median is unity, As many moments exist for the extremal quotient as moments 
and reciprocal moments exist simultaneously for the initial variate The loga¬ 
rithm of the extremal quotient is symmetrically distributed. These properties 
hold for all continuous symmetrical unlimited variates which possess a mono- 
tomcally increasing probability function. 

For the exponential type, the asymptotic distribution of the extiemal quo¬ 
tient can only be expressed by an integral. In this case, no moments exist. For 
the Cauchy type, the asymptotic distribution is very simple, and the logarithm 
of the extremal quotient has the same distribution as the midrange for initial 
distributions of the exponential type. 

It is not necessary to consider asymmetrical distributions since, in this case, 
for sufficiently large samples, one of the extremes will outweigh the other, 
unless the distribution is nearly symmetrical or has lapidly varying tails. 


1. The auto-quotient and the extremal quotient. Let x and y be two inde¬ 
pendent non-negative continuous variates, unlimited to the right Let, f,(x) and 
f(y) be the distributions (probability densities), and let Fi(x) and F 2 (y) be 
the probability functions. Then the joint distribution of the two variates is 
their product The quotient 

( 1 . 1 ) Q-*/V 

is also non-negative and unlimited to the right. Since 

. dx 

x = y ®’ dQ V ’ 

the joint distribution w(y, Q) of the quotient Q and the variate y is 

w (y, Q ) = h(yQ' ] h(y') 'Vi 
523 


(L2) 



K. J. <U UMBEL AND H. D. KEENEY 


rm 

and thn maigmal distribution h(Q) of the* variate Q alone becomes 

(13) MO* f yfi(yQ)My) dy. 

Jo 

The quotient Q possesses a mode if (and only if) f,(.c) possesses a mode. 

Assume now that the two variates x and y have, the same distribution 

(1.4) f\(x) “ fix)-, My) - f(y) 

with the same parameter values. The quotient of two variates with identical 
distributions is henceforth called the auto-quotienl q a . It may be realized if there 
are two independent series of observations taken from the same population and 
ordered in time, Each value from the first series is divided by the corresponding 
value from the second series. Another realization consists in dividing each value 
obtained in one series of independent observations by every other value. A 
third realization is obtained by considering two asymmetrical distributions 
fi(x) end My) where * 0, y £ 0, and 

(1.40 Mv) = M-x). 

The two distributions are called mutually symmetrical, and the auto-quotient 
is 

q « = «/(— v)- 

From the definition of the auto-quotient it follows that the distribution of q a 
must bo the same as the distribution of its reciprocal r — 1 /q a ■ The proof of this 
statement is simple. Under the condition (1.4), the distribution h(q a ) becomes, 
from (1 3) 

(1.5) h(q a ) = f yf(yq a )f(y) dy, 

J o 

The distribution h L (r) of the reciprocal is 

M(r) = ^ jf yf(v/r)f(y) dy. 

If y/r is replaced by x, the distribution of r is 

(1.6) h(r) = h(q a ). 

Thus, the distribution of the auto-quotient of a non-negative unlimited variate 
is invariant under a reciprocal transformation. 

The shape of the distribution h(qf) and the location of the mode may be ob¬ 
tained from the density of probability h{l/q a ) at the value 1 fq a (which differs, 
of course, from the distribution hfr) of r = l/q a ). From (1.6) wo obtain 

Ml /qf) = [ yJ(y/qf)f(y) dy. 

Jo 



THE EXTREMAL QUOTIENT 


525 


The transformation 
leads to 


y/q a = z, dy = q a dz, 


(1-7) h(l/q a ) = qlh(q a ). 

This is a symmetry ?elation for the distribution of the auto-quotient of a non¬ 
negative unlimited variate. If q a is larger than unity, 

( 1 - 8 ) h(l/q a ) > h(q a ). 

If the distribution h(q a ) is continuous for all values of q a , the derivative of 
equation (1.7) with respect to q a leads, for g 0 = 1, to 

(1.9) h'( 1) = ~h{ 1) 

If the distribution h(q a ) possesses a unique mode, it must be less than unity. 
The moments q k are, from (1.5) 

= f f q k yf(qy)f(y) dy dg 

J oa«=(1 J wo-fl 

/■*"“ f( v ) /■««-* 

= / ~ k / (fr!/)/(««2/) d{q a y ) dy. 

j v~o y o 

The inner integral is the moment i/ of order fc of the initial variate y, and the 
remaining integral is its reciprocal moment y~ K of order — k. Thus 

0.10) S = = 

The moments of order k and of order —kotq a exist if the moments and the 
reciprocal moments of order k for the initial variate exist simultaneously. The 
second equation m (1.10) also follows immediately from the mvariance of q„ 
under, a reciprocal transformation. Even if the initial distribution possesses all 
moments, the mean g a need not exist, and the same holds, of course, for the mean 
error and the higher moments The procedure, usual in economic and meteorolog¬ 
ical statistics, of calculating the quotients of two series of independent posi¬ 
tive variables in order to test whether this ratio is constant may be misleading, 
especially if the two series happen to be samples taken from the same population. 
The theoretical mean need not exist, and the calculated mean of the observed 
quotients need not characterize the relation between the two series. 

The probability function H(Q) of the quotient Q obtained from (1,3) is 

H{Q) = f f yfi(zy)My) dy dz. 

J 0 «7Q 

Change of the order of integration leads to 

H{Q) = [ My)Fi(Qy ) dy 

Jo 



520 


i; j, cjrsiiiKr, and u, n. khk.vkv 


The probability function //(</„) of the nuto-quolicnt obtained from (1,4) is 

(Ul) //(«.) = [ 1 HthU) dF 

-'o 

Integration by parts loads to 

(1.12) 7/(r/ n ) ® 1 - q„ f F(y)f(q a ;/) Ay. 

Jo 

The boundary condition, 11(0) = 0; 77(=°) = 1 can ittirnodiuloly bo. verified if 
the preceding equation is written in the form 

Ifc. 

(1.13) 77 (r/ 0 ) = 1 - / F(z/q„)f(z) dz 

Jo 

The probability H(q a ) possesses a symmetry relation which is analogous to 
(1.7). The probability at the value \/q a is, from (1.11), 

77(1 /</„) = [ F(y/q a )f(y) dy. 

If wc introduce the variable of integration 

V = 

wo obtain from (1.12) 

(1.14) II(qa) - 1 - H(l/q a ). 

If q a is any quantile, such that II(q a ) = P, its reciprocal l/q a has the probability 
1 — P. The first quartile (decile) is the reciprocal of the third quartile, (ninth 
decile) and so on. 

For q„ = 1, equation (1,14) leads to 
(1.14') 77(1) = i 

The median of the auto-quolieni of a positive unlimited variate is unity. From 
(1.9) it follows that the median surpasses the mode, if a unique mode exists 
Finally, equation (1.14) may be used to construct a symmetrical distribution. 
If a new variate 

(1-15) z = Ig q a 

with the probability function H*(z) is introduced, the symmetry relation (1.14) 
becomes 

(1.16) II*(z) = 1 - H*(-z). 

The logarithm of the auto-quotient of a positive unlimited variate has a sym¬ 
metrical distribution about median zero. The geometric mean of q„ exists and is 
equal to unity. 



THE EXTREMAL QUOTIENT 


527 


J hese lesults hold if each observed value of a non-negative unlimited variate 
is divided by each other observed value. They do not hold for the quotients of 
two specific order statistics because, in general, the fundamental assumption of 
independence does no longer hold. However, some consequences for the quotients 
of extreme with values may be deduced. 

Consider a symmetrical unlimited variate. Then the distribution m <p( m x) 
of the with smallest value m x, and the distribution <p m (.T m ) of the with largest value 
%m are mutually symmetrical in the sense of (1.4') Therefore the extremal 
quotient 

(U7) q m = -5=- 

m 

may be interpreted as an auto-quotient provided that 1) the probability for 
x m to be negative, and „x to be positive, may be neglected; 2) the distributions 
of the with smallest and the with largest values are independent. Under these 
conditions the distribution, the moments, and the probability function of the 
extremal quotient are obtained from (1 5), (1.10), and (1.11) respectively, if 
the initial distribution /(?/) is replaced by the distribution of the with largest 
values <p m (x„) The symmetry relations (1.7) and (1.14) and their consequence, 
that the median is equal to unity, hold in particular for m = 1, i.e for the ex¬ 
tremal quotient proper, 

The validity of the two conditions has now to be established. 

a) Consider a symmetrical distribution f{x ) with median zero. Then the 
probability that the largest among n observations, x n , is equal to or less than a 
certain ai, is 1 — F n (x) The probability P that the largest among n values is 
positive, i.e. larger than the median, is 

(1.18) P = 1 - 2~ n , 

If ri is sufficiently large, this probability differs from unity by an amount that 
can be made as small as we please Even for relatively small samples, say n = 20, 
the probability that the largest value will be positive is of the order 1 — 10 . 
Thus, we expect only one largest value in a million samples of size 20 to be nega¬ 
tive The same argument shows that the smallest value %i may be expected to 
be negative. Thus the postulate 

(1.19) x n § 0; Xi g 0, 

is a very weak restriction upon the sample size. If wi is sufficiently small, the 

same result holds for the mth extremes. 

b) It is known [7] that the joint distribution tonfa 7 x n ) of the extremes taken 
from an initial distribution of the exponential type converges, for sufficiently 
large samples, toward the product of the asymptotic distribution <p(x n ) of the 
largest value, and 1 < P (x 1 ) of the smallest value. A similar theorem will now be 
proven for a general class of continuous distributions. 



528 


K. J. CIHMHKD AND It. JJ. KKKNKV 


Let m x bo the mth .smallest observation; let x t be the Zth largest observation 
where m and l are small compared to n, n being large. Then the joint distribution 

lttu(niiCj icj) is 

f i ™ n • 

(1.20) '* X ' ~ (>'« - l)l(* ~ 1)!<» - m ~ l) 1 

FLxr~ l (F(xi) ~ /‘ , U)) n " ra ' t (l - F(r t y- l K m x)f(x,)). 

Now the transformation 

(1.21) n(l - F(xi)) « £; nF( m x) « *; 0 g { 5 't 0 g,gn, 

due to Cramdr ([1], p. 871) is used. Then the joint distribution i'„(£, ij) of the 
new variates £ and tj becomes 


where m + J is small compared to n. As n increases, y„(£, ■>}) converges to 


so that in the limit £ and 17 are independent. If now the mild veaUiclion is im¬ 
posed that F{%) be monolonieally increasing, (1,21) detinea a one to one transfor¬ 
mation, and therefore there, must exist an inverse function uniquely defining 
m x as a function of £, and xi as a function of 17. From the limiting independence 
of £ and i) the limiting independence of the extremes m x and an follows at once. 

Thus the second condition is fulfilled, and the will extremal quotient shares 
all properties of the auto-quotient. This holds also for initial symmetrical dis¬ 
tributions which do not possess asymptotic distributions of the extremes. 

I11 the following, the two types of initial distributions of an unlimited variate 
are considered for which asymptotic distributions of the extremes exist, namely, 
the exponential and the Cauchy type. For simplicity, only the extremal quotient 
proper, designated by q, is studied, The two asymptotic probabilities of the 
extremal quotients for these symmetrical distributions are obtained by introduc¬ 
ing the asymptotic distributions of the largest value into the probability func¬ 
tion (1.11) of the auto-quotient, 


2. Application to the exponential type. For symmetrical distributions of the 
exponential type the asymptotic distribution of the largest value is 

(2.1) <p(x) = a exp [—«(ac — u) — c ““ (j: '" u) ], 

where u and a are defined in terms of the initial probability F(x) and the initial 
distribution }(x) by 

(2.2) F(u ) = 1 — 1/n; a = 

n being the sample size. The distribution (2 1) will now be simplified by intro¬ 
ducing a new parameter X defined by 

(2.3) 


X > 0. 



THE EXTREMAL QUOTIENT 


529 


To see the meaning of X, consider Laplace’s first distribution, then the so 
called logistic [6], and the noimal distributions, all of which are of the exponential 
type In the first two cases we obtain, from (2,2), after some calculations, 

(2,4) a = 1, u = lgn - Ig 2; a = 1 - 1/n, u = lg (n - 1), 
whereas for the normal distribution, we have asymptotically 

“ = u = V2 1g (n/v^) 

and 

(2 4') X = tt 2 /(2r). 

For these distributions, and interpreted in this sense, X is of the order of the 
sample size or its square. 

From (2.3) and (2,1) the distribution tp{%) and the probability function 4>(.t) 
are 


(2 5) = a\ exp [—ax — ke “]; $(x) = exp [—Xe “*]. 

In order to fulfill the condition (1.19), namely 4>(0) = 0, the distribution <p{x) 
must be truncated at x = 0. This leads to the truncated distribution <pi(x) and 
the truncated probability $ t (a:) where 


( 2 . 6 ) 


ak oxp (—as — Xe "*] 

1 - tr* ’ 


$t(x) 


exp [-Xe'‘“*] - e x 
1 - e -x 


The asymptotic probability function H\(q) for the extremal quotient of a sym¬ 
metrical variate of the exponential type is now obtained from (1.11), if y, }{y), 
and F(y), are replaced by x, <p t (x) and fy(x), respectively, and the index a is 
dropped. Consequently, from (2.6), 


Ih(q) = -— rr-. f ax exp [-ax - ke az - ke aqx ] dx 

(1 — e~ A ) Jo 

- -—-—[ ak exp [-ax - Xe -01 ] dx. 
(1 — e Jo 

The tiansformation 




ae a 'dx = — dz 


leads to 
(2.7) 


ffx(g) = 


_ i _ r 

(1 — e _x ) 2 Jo 


ke- Ml+ ‘ !> dz - 



This probability of the extremal quotient for initial symmetrical distributions 
of the exponential type is not truely asymptotic since the parameter X depends 
upon n. (See Addendum). 

Unfortunately, the expression (2,7) cannot be integrated Therefore the prob¬ 
ability function has to be studied in an analytic way. For this purpose we first 


recall the general properties 

H( 0) = 0; H( 1) = H(*>) = 1, 


valid for any value of X Furthermore,' for any X, we have the symmetry rela¬ 
tion (1 14). These properties can be verified at once from (2 7). 



530 


K. J, ai:.M«lvL AND H II. KKKNKV 


The numerical values of II\{q) can easily he calculated for q — \ and q = 2, 
Consider a value of X, say of the order 0. Then formula (2.7) may be written 


lh(2) m f dz 

A 

= a/xV" f \/X dz. 

Jo 


If we introduce 


a/x (* + \) = 


\/X dz 


V A V* -r f 1 ** yTj i , 

the probability Hx(2) becomes a difference of two normal probability integrals, 

Ih(2) = VS e x/4 1 - 5- - (l - F (d , 

where F stands for the normal probability function. 

The second expression may bo neglected compared to the first one for X ^ 4 , 
whence 


lh{ 2 ) = J~ e hH p <~ 


The symmetry relation (1.14) leads to the knowledge of Thus the three 

probabilities H( 1), and /i\(2) are known. 

To see the influence of X on H\{ 2), we use a mctliod due to 11. D. Gordon [4]. 
This author considers a function li 3 defined by 


e, = e t,/J f 

Ja 


and proves that 


^ - xR - 1 < 0; 
ax 


e |,/J dt, x > 0, 


d*R dR , p . n 
_ _ « _ + B > °. 


It follows that 


35 (xE) > a 


If we substitute for®, this inequality may be written, from (2.9) and (2.10), 


= 2\/2X 


dlh(2) 


Consequently H\(2) increases with X whereas, from (1.14), the probability 
H\(%) decreases with X. The following table gives the probabilities H\( 2) and 
(2.9) and their differences 

(2-11) Px(2) = J5Tx(2) - ffx(i). 



THE EXTREMAL QUOTIENT 


531 


Asymptotic probabilities of the extremal quotient for symmetrical distributions of 

the exponential type 


Parameter 

- — - - - 

Probabilities (2.9), ( 1 14) 

Probability (2.11) 

X 

m 2 ) 

ffx(l) 

PA 2) 

8 

.84370 

.15624 

.68752 

18 

.91377 

.08623 

.82754 

32 

.94601 

.05339 

.89322 

50 

.96438 

.03562 

.92876 

72 

.97427 

.02573 

.94854 

98 

.98087 

.01913 

.96174 


The approximative shape of Hy(q) is traced, for X = 8,..., 98, and \ < q < 2 
in Graph (1). Since we know from (1.16) that Ig q has a symmetrical distribu¬ 
tion, we use a logarithmically normal probability paper where q is plotted on 
the abscissa in a logarithmic scale, and H\(q) is plotted on the ordinate in a 
normal probability scale The probability P\(2) for any value of q to be con¬ 
tained in the interval \ < q < 2 increases with X, he., with the sample size, and 
the distribution of the extremal quotient contracts. 


I) ASYMPTOTIC PROBABILITY OF THE EXTREMAL 
QUOTIENT FOR THE EXPONENTIAL TYPE 



If the initial distribution is unknown, the parameter X has to be estimated 
from the observed extremal quotients. Equation (2.11) may be used for this 




PROBABILITY 


532 


E. J. GUMBEL AND R. D. KEENEY 


purpose. We calculate the observed relative frequency P\[ 2) of extremal quo¬ 
tients contained between q - \ and q = 2, and substitute it for the probability 
P\( 2). To facilitate this estimate of X, we trace P\(2) against X in graph (2). 
The probability P*(2) is traced on the ordinate m linear scale, and the parameter 
X is traced on the abscissa in inverse scale. Thus X is easily estimated from the 
observed relative frequency P>(2). 

2) ESTIMATION OF THE PARAMETER X 


12 .11 ,10 .09 .09 ,07 .06 .00 .04 .09 ,02 .01 Q 



The distribution h\(q) of the extremal quotient obtained by differentiating 
the probability function (2.7) with respect to q is 

(212) * h(q) = jf <f X>+,a) *Vlg z) dz. 

The symmetry relation (1.7) is easily verified. We now investigate the boundary 
value MO) and prove that 

(2.13) lim h\{q) = MO). 

This is not obvious, since z 1 becomes indeterminate if both z and q vanish. For 
the proof of (2 13), consider the integral 

I = X [ e"" x '(—lg z) dz 

Jq 


(2.14) 



THE EXTREMAL QUOTIENT 


533 


or 


(2.15) I — (1 — e x ) lg X — y e 1 lg X — «(—X). 

The last term, the exponential integral, is positive. The value of A>,(0) is thus 
from ( 2 . 12 ) 


(2.16) 

The difference 


hi 0 ) 


Xe~ x (lg X - 7 - a(— X)) 
(1 - e-iy 


A = (1 - e-')\h(q) - MO)) 

becomes, from (2.12), (2.15) and (2 16), by the use of the mean value theorem 
and after expansion 

A = /(X) f (e rM z l — e~ x ) dz 
jo 


- /oo i 

v-0 


(—l) r X' / 1 

v» \(v + l)ff + 1 


0 


) 


where /(X) is a positive function. Since the series is absolutely convergent, the 
difference A vanishes for q = 0, and the density of probability for <2 = 0 is given 
by (2.16). The condition h\( 0 ) ^ 0, valid for any distribution, is met provided 
that 

(2,17) X > 1.794 


By virtue of (2.4) this is a (weak) condition concerning the sample size. From 
(2.16) it follows that h\( 0 ) does not vanish although its numerical value is very 
small. 

The existence of at least one mode follows from the fact that tho distribution 
M<?) is continuous, very small for q = 0, and vanishes for q — ». Equation 
(1.9) proves that any mode is inferior to unity. The distribution contracts for 
increasing values of the parameter Therefore the mode approaches the median 
with increasing sample size. 

Since the distributions of the exponential type do not possess reciprocal mo¬ 
ments it follows from ( 1 . 10 ) that the distribution h\(q ) does not possess moments. 
The mean extremal quotient q diverges. Because the logarithmically normal 
distribution used in graph ( 1 ) as first approximation to the distribution hy(q) 
possesses all moments, the distribution h\(q) has a much longer tail than the 
logarithmically normal one. 


3. Application to the Cauchy type. For the exponential type, the asymptotic 
distribution of the extremal quotient can only be expressed in the form of an 
integral containing a parameter X which is a function of the sample size. For the 
Cauchy type, to be defined in the following, the asymptotic distribution will 
turn out to be very simple. 



534 


K. J. GU.MBEL AMO K. 0. KEENEY 


A distribution of a variate a £ 1 was said [5] to be of the Pareto type if 

(3.1) lim ,t 1 (1 - F(x)) = A] k >0; A > 0. 

XaaoO 

Wo now say that a variate is of the Cauchy type if it is unlimited, continuous, 
subject to (3.1), and symmetrical about zero. Distributions of the Pareto and 
the Cauchy type do not possess momenta of an order equal to or larger than 7c 
However, not all unlimited symmetrical distributions with a finite number of 
moments arc of the Cauchy type. 

The simplest example of such a distribution is the Cauchy distribution itself 

(3.2) fix) = ; Fix) = 7r + ~ arc tg x, 

which possesses no moments. For large absoluLe values of x, the usual expansion 
leads to 

Fix) = 1 - — + Oix' 1 ); Fi-x) = — - Oix~ l ). 

7 rx irX 

If the factois 0 (m —2 ) are neglected, the parameters A and k in (3.1) are 

(3.2') A = tt -1 ; L = 1. 

For the Cauchy type, the asymptotic probability II(m) and distribution 7r(m) 
of the largest value a: = x n established by Frdchct [3], It. A. Fisher [2] and R. von 
Mises [8] are 

(3.3) n(x) = exp [-(|) ] ; TT(x-) = l (£) exp [- (|) ], 
where u is defined by (2.2). 

The condition (1.19) is fulfilled for any sample size which is so large that the 
asymptotic distribution of the extremes may be used. The asymptotic prob¬ 
ability Hkiq) of the extremal quotient for the Cauchy type is obtained from (1.11), 
if V> fiy) and Fiy) are replaced by x, 7r(m), and n(a;), respectively, where the 
indices n and a are omitted. Consequently, from (3.3), 

H k iq) = - (~\ +1 6 - (u/i,Mu/5i >‘ dx. 

Jo u \x/ 

From the transformation 



the asymptotic probability Hkiq) atl d the asymptotic distribution hkiq) of the 
extremal quotient become simply 




THE EXTREMAL QUOTIENT 


535 


Evidently, the symmetry relations (1.7) and (1.14) are fulfilled for any k. The 
graphs (3) and (4) show the distribution h k (q) and the probability H k (q) for 
the moat interesting cases k — 1, 2, 3. From 

TT® ' >* - »*<»» 

it follows: For fc increasing, the probability H k {q) decreases for q < 1 , and in¬ 
creases for q > 1. The distribution contracts with increasing values of the parameter 
k as shown in the graphs (3) and (4). The more moments that exist in the initial 
distribution, the more concentrated is the distribution of the extremal quotient. 



EXTREMAL QUOTIENT tj 


The density of probability 

fc(l) = fc/4 

of the median obtained from (3.4) and (1.14') increases with fc. The mode q of 
the extremal quotient is obtained from (3.4) For k > 1 this leads to 


(3.5) 









530 


E. J, GUMEEL AND R. D. KEEN'EV 


For k g 1 no mode exists, and the distribution diminishes with q. The larger 
k, the smaller is the distance from the median to the mode, and hence, the 
smaller the asymmetry. The density of probability of the mode increases with 
k, and the probability 


(3.6) IUq) = 1(1 - l/k) 

approaches k. The distribution (3.4) belongs to the Pareto type and has no 
moments of an order equal to or greater than k. 

In N samples of sufficiently largo size n, the largest quotient q h k , defined in 
the same way as u in equation (2.2), obtained from (3 4) 

(3.7) gjr - N - 1 

increases as a root of the number of samples, i.e very quickly. The higher the 
order of the highest moments existing, the smaller will the expected largest quo¬ 
tient be. 

From (3.4) and the symmetry (1.14) we obtain 

(3.8) H k {q) - H k (l/q) = 1 - 2/(1 + q K ). 


The larger k, the larger is the percentage of the observations contained in the 
interval l/q to q. 

For a systematic estimate of k, the transformation (1.13) is used. Formula 
(3.4) leads to the probability H*{z) and the distribution h*(z) where 


(3.9) 


H*(z) 


1 

1 + e~ k ‘ ’ 


h*(z) 


ke~ k ‘ 

(1 + e-*‘) 5 ’ 


The logarithm of the extremal quotient for initial distributions of the Cauchy 
type (where no moments of an order equaling or exceeding k exist) has the 
logistic distribution, [6], as the midrange v = x n + Xi for distiibutions of the 
exponential type (where all moments exist). The logarithm of the extremal 
quotient plotted on logistic probability paper should be scattered around a 
straight line 

The order k of the lowest moment which diverges is obtained from the vari¬ 
ance a* of the distribution /i*(z) which is [6] 


(3.10) 


cr, = 


3 k 1 ' 


For the estimate of k from (3.10), c\ is replaced by the estimate s) obtained from 


(3.11) 


si 



£ig 2 

*-i 


%n ,v 

— Si,/ 


For the Cauchy distribution itself, k = 1, and the probability and the dis¬ 
tribution of the extremal quotient 


HM = e/(l + 9); Mg) = (1 4- g)" 2 



THE EXTREMAL QUOTIENT 


537 


are similar to the initial distribution. 

The asymptotic distribution of the extremal quotient for initial distributions 
of the Cauchy type contains one parameter only, the order of the lowest diverg¬ 
ing moment in the initial distribution All other traces of the initial distribution 
have disappeared. 

4. Comparison of the extremal properties for the two types of initial distribu¬ 
tions. Assume that the initial distribution is symmetrical, unlimited, and pos¬ 
sesses an asymptotic distribution of the extremes. This is not always fulfilled. 
All moments may exist, and yet the distribution may not belong to the expo¬ 
nential type. No moments may exist, and yet the distribution may not belong 
to the Cauchy type. If the assumption holds, the initial distribution belongs 
either to the Cauchy, or to the exponential type. 

We take N samples of size n, and estimate the median X of the population 
from the central value m of the N central values of the samples Let Xi, v and 
X„, v (v — 1,2, • • ■ A') be the two extremes. If it happens for any v that 

X t , v > m or X n ,, < m 

the sample is too small, and its size has to be increased. The central value q of 
the observed extremal quotients q, = — m)/hn — Xi,„) must be near 

unity. 

If the initial distribution is of the exponential type, all moments in the popula¬ 
tion exist, and the midrange has the logistic distribution. If the initial distribu¬ 
tion is of the Cauchy type, uo moments of an order greater than k exist, and the 
logarithm of the extremal quotient has the logistic distribution. The order k 
can be estimated from the variance (3.11). If all moments in the population di¬ 
verge, the calculation of the observed moments is futile since they do not charac¬ 
terize the population. 

Addendum. The referee of this paper has suggested the following method for 
obtaining an asymptotic distribution of the extremal quotient for the exponen¬ 
tial type. For large values of X, formula (2.7) becomes, approximately, 

tfx(g) = 


SM - l ‘ {- * [* + (!)''*]} *■ 

The further transformation 

e‘ = X 8 " 1 , g-1 =t/lg\- 



538 


E. J. GUMJ1EL AND R. D. KEENEY 


leads to the probability H*(t) of the variate t 


B*{t) = exp I- y[l + <fV /1|X ]| dy, 
Jo 


whence asymptotically for X -> « 

11* {t) - f exp {-y(l + c" 1 )) dy 
Jo 

= 1/(1 + f ). 

Therefore the logistic distribution holds at the same time for both initial types, 
using the transformation t ~ <xu(q - 1) for the exponential type, and the loga¬ 
rithmic transformation for the Cauchy type. 


REFERENCES 

[1] H. Cramer, Mathematical Method* of Statistics, Princeton University Press, 1940. 

[2] R. A. Fisher and L. H. C. Tippett, “Limiting forms of the frequency distribution 

of the smallest and the largest member of a sample/’ Proc, Camb. Philos, Sac., 
Vol. 24 (1928), p. 180. 

[3] M. FrI&chet, Sur la loi do probability do l’ycart maximum. Amahs Soc, Polon, Math,, 

Vol. 6 (1927). 

[4] R. D, Gordon, “Values of Mills ratio of area to boundary ordinate and of the normal 

probability integral for large values of the argument,” Annals of Math, Slat,, 
Vol. 12 (1941), pp, 364-366. 

[6] E. J, Gtjmdel, “The return period of flood flows/’ Annals of Math, Slat., Vol, 12 (1941), 
pp. 103-190, 

[6] E. J. Gumbid, “Ranges and midranges/’ Annals of Math, Slat., Vol, 15 (1944), pp. 

414-422. 

[7] E. J, Guiubel, “On the independence of the extremes in a Bample,” Amah of Math . 

Slat,, Vol 17 (1946), pp. 78-81, 

[8] R, von Mibes, “La distribution do la plus grande de n valeurs,” Revue Math . de I’Vnion 

Interbalkanique, Vol. 1 (1936). 



ON A PRELIMINARY TEST FOR POOLING MEAN SQUARES 
IN THE ANALYSIS OF VARIANCE 1 

By A. E. Paull 

Grain Research Laboratory, Winnipeg 

Summary. The paper describes the consequences of performing a preliminary 
E-test in. the analysis of variance, The use of the 5% or 25% significance level 
for the preliminary test results in disturbances that are frequently large enough 
to lead to incorrect inferences in the final test. A more stable procedure is recom¬ 
mended for performing the preliminary test in which the two mean squares 
are pooled only if their ratio is less than twice the 50% point. 

I. Introduction 

The problem discussed in this paper is one of a large class involving preliminary 
tests of significance. Studies of this type have recently been made by Bancroft 
[ 1 ] and Hosteller [2], Bancroft dealt with a preliminary test for homogeneity 
of two variances, and a test of a regression coefficient. Mosteller dealt with the 
problem of pooling means from two normal populations having the same known 
variance. The present problem is an extension of Bancroft’s work from investiga¬ 
tions of the bias and variance of an estimate of variance, to investigations of the 
consequences of using that estimate in performing a further test of significance. 

The problem arises frequently in the analysis of variance. As a simple example, 
consider an experiment carried out to test the hypothesis that different labora¬ 
tories in a district all determine the protein content of wheat without systematic 
differences between laboratories. Three laboratories are selected at random 
and each is requested to analyze ten samples of the same wheat, five on each of 
two days. The analysis of variance would be set up in one of two ways: 


MODELi 

Source of variation if MS 

Between laboratories 2 rj 

Between days within labs 3 v. 

Within days 24 v. 


MODEL II 


Source of variation 

df 

M S 

Between laboratories 

2 

V} 

Within laboratories 

27 

3«! -j- 24vj 

07 


The soundest procedure is to follow Model I in which the E-ratio, 1 / 3 / Vi, 
provides a valid though not very powerful test of the null hypothesis. But the 
investigator often doubts that this<is the most effective form of analysis His past 
experience may have shown that measurements of this kind seldom exhibit 
day-to-day variations appreciably greater than their within-day variations. 
If he is willing to accept this credible assumption, he adopts Model II because 

1 Based on a doctoral dissertation submitted to the Faculty of North Carolina State 
College of the University of North Carolina at Raleigh, N. C , m June, 1948. Published as 
Paper No 107 of the Grain Research Laboratory, Board of Grain Commissioners, Winnipeg. 

539 



540 


A. E, PAITLL 


this increases the degrees nf freedom from 2 find 3 to 2 and 27. These two models 
may conveniently be called the “never pool” and the “always pool” procedures, 

The investigator often prefers what may be called a "sometimes pool” pro¬ 
cedure. He starts with Model I and examines the null hypothesis that the 
variation between days is no greater than the variation within days by testing 
the F-ratio t'a/i’i • For this test, he selects a probability level 1\ that may be the 
5 % or some higher level. If the hypothesis of this preliminary test is not rejected, 
his judgement has been substantiated and he adopts Model II and pools the 
two mean squares. If the hypothesis is rejected, he retains Model I since he 
concludes that v% alone Is the only valid estimate of error. 

The following notation is introduced: 


Degrees af freedem 

Mian square 

Expected value of mean square 

n, 

Vs 


ni 

Vs 


»i 

Vi 

<rl 


where <r\ < <rl < al . 

The mean squares Cj, y a , and v t are assumed to be distributed as central 
chi-squares. This assumption is justified if the treatments (laboratories in the 
example) are selected at random from a population of treatments. But if, as is 
more frequently the case, the experimenter is interested only in specified treat¬ 
ments, the non-ccntral chi-square model is the appropriate ono. However, if 
the two cases are sufficiently parallel, as seems probable, conclusions drawn 
from the central model may be expected to apply to the non-central model. 

Let fin - a\l<n and On = <r\/a \, and let F{n , v s , P) denote the value exceeded 
by F for vi and v 2 degrees of freedom with probability P. The rule of procedure 
for the “sometimes pool” test may be restated as follows: 

Reject the mam hypothesis that = crl{8n = 1) if 

Vi/vl > Flint , m ; Pi) and v 3 /v 2 > P 2 (n 3 , n 2 ; Pi) 

or if 

vt/vi < Fi{ni , ni ; Pi) and (n 2 + n 1 )v 3 / (n 2 i > 2 + niVi) > P 3 (n 3 , nj + ni ; P 3 ). 

The “never pool” procedure in which P 2 is used, and the “always pool” procedure 
in which P 3 is used, may be considered as special cases of the “sometimes pool” 
procedure in which Pi takes on its extreme values, 1 and 0 respectively. In 
practice, the probability levels P 2 and Pa are usually the same; in the present 
study they are allowed to be different in case this greater flexibility should prove 
desirable. The objects of the investigation are: (a) to examine the Type I error 
under the above rule of procedure, i.e., to determine the frequency of rejecting 
the null hypothesis when it is true; and (b) to examine the behaviour of the power 
with particular reference to comparisons with the power of the “never pool” 
procedure. 

The remainder of this paper is divided into four sections: Part II contains a 



PRELIMINARY TEST 


541 


general discussion of the results, conclusions and recommendations; and Part III 
illustrates the general conclusions with numerical examples The derivation of 
distributions, proofs by elementary arguments of general qualitative results, and 
derivations of closed form expressions for rt 3 = 2, are given in Part IV. 

II General Discussion on Results, Conclusions and Recommendations 

2.1. Criterion employed. In this part the principal results and recommenda¬ 
tions are discussed for the reader who is not interested in the mathematical 
details. To give results in a simple form is not easy, because of the many variables 
—the P’s, the 0’s, and the n's —that enter into the problem. It may be helpful 
to consider what is wrong with the “always pool” test, and then to state the 
properties which the preliminary test must have if it is to be regarded as useful 
and successful. 

If the “always pool” procedure is employed when m fact <r\ is greater than 
c \, i.e. 02i > 1) the denominator in the final F test tends to be too small. Thus 
the final F test gives too many significant results when its null hypothesis is 
true and if 6 n is great enough, there is no bound to this hidden distortion of the 
significance level. A test which the research worker thinks is being made at the 
5% level might actually he at, say, the 47% level. 

The preliminary test represents an attempt to avoid this alarming disturbance, 
since if 0 2 i is very large the test is expected to warn against pooling Such a 
procedure, however, can not be expected to remove this disturbance completely, 
and it does not do so, but to be successful it should keep the true or effective 
significance level of the final F test close to the nominal level at which the 
research worker thinks he is working. 

A second requirement is that the preliminary test should increase the power in 
the final F test relative to the power of the “never pool” test When the powers of 
the “sometimes pool” and “never pool” tests are compared, it is important to 
make the comparison at the same significance level. Suppose the preliminary test 
shifts the significance level of the final F test frorq the 5% to the 6% level—a 
disturbance that for some uses would not be regarded as serious In this event the 
“sometimes pool” test (at the 6% level) would tend to be more powerful than 
the “never pool” test at the 5% level, because an increase in significance level 
generally results in an increase in power. But unless the “sometimes pool' test 
has more power than a “never pool” test made also at the 6% level, it has no 
real advantage over the “never pool” procedure 

2.2. Effect of preliminary tests made at the 6% level. Probably the most 
common procedure in practice is to perform the preliminary test at the 5% level 
(i.e. P i = .05) and, whether pooling is prescribed or not, to conduct the final F 
test also at the 5% level, (i.e. Pz = Pz — -05)- Such a procedure, except when 
02 i is near one and the null hypothesis is true, results in the null hypothesis being 
rejected more frequently than if pooling is never resorted to 

When the ratio 0 2 i is equal to one, so that routine pooling would be valid, the 



542 


A. E. PATJLL 


preliminary test is effective. The true .significance, level of the final F test is 
decreased slightly, but is always confined between the 5% and the 4.75% levels. 
Further, the power is always greater than that of the “never pool’’ test made 
at the same significance level. 

As 02i increases from 1, the true significance level of the final F test increases 
to a maximum and then slowly decreases to 5%. Unfortunately the maximum 
need not bo near to 5%: in the example presented later it is about 15%, and for a 
broad range of values of 0 n the true significance level is higher than 10%. Com¬ 
parison with the power of the “never pool’’ test is also unfavorable to the “some¬ 
times pool” test. For values of dn near 1, the “sometimes pool” test has the 
higher power, but as 6 n becomes larger the advantage passes to the “never 
pool” test. 

When 0 2 i is very large there is, as would be expected, little disturbance. The 
preliminary teat seldom prescribes pooling, so that the properties of the "some¬ 
times pool” test are very similar to those of the "never pool” test, although the 
“never pool” procedure yields the slightly higher power. 

The main objection to the use of the “sometimes pool” test is associated with 
the intermediate values of On . If over a series of experiments On has a moderate 
value greater than one, the “sometimes pool” test at the 5% levels yields more 
apparently significant results than are anticipated, and is also less powerful 
than a corresponding "never pool” test. The magnitude of these undesirable 
properties can be reduced somewhat by increasing the significance level of the 
preliminary test. 

2.3. Effect of preliminary tests made at the 26% level. Use of the 25% in¬ 
stead of the 5% significance level for the preliminary test reduces, in general, the 
probability of rejecting the hypothesis. This reduction, at intermediate values 
of 02i, results in a reduction of the extreme disturbances. When the ratio 0 2 i is 
equal to one, however, the effects are not as favourable. If the hypothesis is 
true, still fewer apparently significant results occur. A final test being carried 
out at the 5% level can now have an effective significance level close to 3.75%. 
If the hypothesis is false, the test is still more powerful than a corresponding 
“never pool” test but the gain is not as great as when a preliminary test at the 
5% level is employed. Since most experimenters desire a reasonable amount of 
protection against an error in judgement of the true value of On , the reduction 
in disturbances for intermediate values of 02i, resulting from tho use of the 25% 
rather than the 5% level, would bo judged to outweigh the disadvantages of the 
compensating factors. 

2.4, Effect of further increases In significance level. Increasing Pi, the sig¬ 
nificance level of the preliminary test, decreases the probability of rejecting 
the hypothesis only to the point where a critical value Pj is reached. Increasing 
Pi beyond this value results in an increase in the probability of rejection. The 
properties of a “sometimes pool” test in which Pi is less than Pi differ, in general, 
from those of a test m which Pi is greater than Pi. 



PRELIMINARY TEST 


543 


Tests of the former type, which are referred to here as Class A tests, are the 
tests commonly encountered in practice Considering, for example, a test in 
which Pa = Pa = .05 and n L = 20, n 2 = 4, = 2, we find the critical value Pi 

to be .77, a figure much larger than the values .05 or .25 customarily chosen 
for Pi . The major portion of the present discussion deals with Class A tests. 
Tests in which Pi is greater than Pi are referred to as Class B tests and discussion 
of their properties is relegated to a later section. An expression for evaluating 
Pi is given in Subsection 4. 3. 


2.6. Effect of P 2> P s . The probability levels (P 2 , Pi) used for the final test 
determine the properties of the “sometimes pool” test for extreme values of On . 
When 02i is equal to one, the effective significance level is less than the nominal 
value Pa , but is not less than (1 - Pi)P 3 . The power of such a test is greater 
than the power of a corresponding “never pool” test, but less than the power of a 
test in which one always pools and uses the Pa level. For very large values of 6 n 
the behavior of the “sometimes pool” test approaches, in all respects, the 
behaviour of a “never pool” test at the P 2 level. 


2.6. Effect of n 2 , ni. The degrees of freedom n 2 and n x , associated with the 
mean squares that are sometimes pooled, clearly affect the magnitude of the 
disturbance. Because analytic investigation becomes complex, the following 
remarks are based on conjectures arising out of examination of a number of 
numerical examples. 

A large value of n 2 is desirable m two respects. As n 2 becomes larger the 
preliminary test becomes more powerful and pooling is prescribed less often. In 
addition when pooling is prescribed the pooled mean square is further weighted 
.in favour of the valid error <r 2 . Both factors are contributing towards a decrease 
in bias of the error mean square with a consequent reduction in the disturbance 

introduced into the final test. , 

The effect of n x is not as simple. As n x becomes larger the preliminary test 
again becomes more powerful and pooling is prescribed less often. But when 
pooling is prescribed, the pooled mean square in this case is further weigh e m 
favour of A , which is smaller than the valid error cr 2 . The effect on the final 
test, which is due to a combination of these two factors, clearly depends on the 
value of 0 2 i. For intermediate values of 0 2 i the latter factor is the predomman 
one, and the disturbance of the effective significance level is increased as n x is 

increased. 

2 7 Class B Test. A Class B test is one in which the probability level (Pi) 

of the preliminary test is greater than a critical value Pi. 
only when the mean square * is relatively large, with the result that; the eiror 
mean square tends to be too large. Accordingly, a Class B somet^ol 
test rejects the hypothesis less frequently than a never poo test at t ie P 2 te J 
The effective significance level of a Class.B test is less than for Rvalues 
of 02 i. It has its lowest value when 0 2 i is equal to one, and approaches P 2 » 



544 


A. E. PAT’LL 


becomes very large. Because pooling is prescribed infrequently, little power is 
gained by the use. of a Class B test rather than a “never pool” test. 

2.8. Recommendations. The principal conclusions discussed in the preceding 
subsections may be summarized as follows: A preliminary test carried out at a 
significance level as low as 5% affords little protection against errors in judge¬ 
ment. If cri is equal to a-‘i(O n — 1) the reduction in errors of inference is appre¬ 
ciable; but if, in fact, aj is less than <r?(0 5 , > 1), a greater number of incorrect 
inferences are made than if a preliminary test is not employed at all. The use 
of the 25% significance level for the preliminary test introduces the same dis¬ 
turbances but to a lesser extent. Extreme increases in the effective significance 
level at possible values of On are reduced and losses in power at these values are 
not as serious. The 25% level provides a reasonable amount of protection against 
an error in judgement regarding the true value of On . However, when n 2 is 
large relative to n t , a smaller significance level could be employed without 
introducing any serious disturbances at the intermediate values of d n , and 
with a resulting gain in power at values of On neur one. 

The following method of performing a preliminary test is recommended as one 
which tends to stabilize the disturbances at intermediate vuluos of On while still 
taking advantage of a considerable portion of the possible gain in power at 
values of On near one. The procedure consists of pooling the two mean squares 
Vi and «i only if their ratio is less than 2 F K , where is the 50 per cent point 
of the F-distribution for % and n x degrees of freedom. The use of the multiple 2 
is arbitrary and a smaller value may be used if the experimenter desires additional 
control over extreme disturbances. 

This procedure has the advantage of admitting less disturbance over a larger 
range of values of n 2 and ni. The customary method prescribes pooling if the null 
hypothesis (On — 1) of the preliminary test is not rejected at some preassigned 
probability level Pi. If enough observations are available to provide reliable 
values for v 2 and t»i, pooling is prescribed only if a\ and a? are essentially the same. 
However, if small numbers of degrees of freedom are involved, the preliminary 
test is too weak to reject the hypothesis even if <ri is appreciably less than o\ , 
and pooling will be prescribed too frequently. On the other hand, the use of the 
recommended procedure has the effect of prescribing pooling only when it can 
be said, with confidence exceeding 50%, that the true value of On is less than 
some chosen value such as 2. 

This can be demonstrated simply by considering a series of experiments 
in which preliminary tests are performed. When vi/v\ < 2/'Vi, we make the 
statement 

(1) On < 2, 

and when v 2 /vi > 2P M , we make the statement 

( 2 ) 0 n > 2 . 



PRELIMINARY TEST 


545 


We have 


or 


Pr\ v 2 • I > 
Vl 021 



= .50, 


Pr > F 60 5 2 i| = .50. 

If statement (1) is true, 

Pr ^ < 2F, 0 \ > .50; 

and if statement (2) is true, 

Pr > 2F 6 oj > 50. 

Thus, no matter rvhat the true value of 9n , the statements are true more 
than 50% of the time. 

Fifty per cent points of the F-distribution have been tabulated by Merrmgton 
and Thompson [3]. 

A simpler rule, and one which is nearly equivalent when the degrees of freedom 
involved are each greater than 6, is to pool if the ratio of the mean squares is less 
than 2, without any reference to the F-table. For smaller numbers of degrees of 
freedom, however, this simpler rule does not embody the advantages of the 
2Fso rule, unless of course, n\ and n 2 are equal. 


III. Numerical Illustrations 


3.1. Effect of Pt illustrated. An example of the influence of Pi on the effective 
significance level or Type I error of a "sometimes pool” test is illustrated in 
Figure 1. When Pi = 0, the Type I error has its maximum value equivalent to 
the Type I error of an "always pool” test at the P 3 level. As Pi increases from 
zero, the Type I error decreases until at Pi = Pi(.77 in this case) it reaches its 
minimum value at a level less than P 2 . As Pi increases from J \, the Type I 
error increases until, at Pi = 1, the Type I error is equal to P 2 

The influence of Pi on the power of a “sometimes pool” test is illustrated in 
Figure 2. The gain in power, as a function of 9 2V is presented for three Class A 
tests. Since comparisons of power are made over tests having different Type I 
errors, the gain is expressed as the proportion actually attained of the total 
gain in power that is possible if the true value of 6n is actually known. When 

Pj = Jpj = .77, the curve is observed to decrease monotonically to zero. However, 

for lower values of Pi, the preliminary test prescribes pooling more often, and 
more power is gained when da is near one but less power is gained or power is 
actually lost when Ai is large 



540 


A. K, PAULI, 


The power gained or lost at various values of On is illustrated in Table I 
The probability of rejecting the hypothesis for the “sometimes pool” test is 




Fig. 1 . Effect of Varying Pi. u, - 20, n, =. 4, n, - 2 and P, - P, - .05. (a) Upper 
diagram: Class A Teats (b) Lower diagram. Class B Tests. 



Fi q ^ 2. 1 roportion of Possible Gain in Power Actually Attained, tii — 20, «•> — 4, n, *■ 2, 
“a = Pz = ,05, 

tabulated opposite “s p.”, and for the “never pool” test having the same Type I 
error opposite “n p.”. 



PRELIMINARY TEST 


547 


The last line of the table approaches the probabilities for a “never pool” test 
having a Type I error of 5%. Except for values very near (0 21 ,0 32 ) = (l } i) ( the 
probability of rejecting the null hypothesis, using a “sometimes pool” test, is 
greater than if a “never pool” test, at the 5% level is used. In this sense, the 

TABLE I 

Comparison of Pomer of a. “Sometimes Pool" (s p.) Test and Corresponding “Never Pool " 

(n.p,) Tests 




ni = 

20, ni ■■ 

= 4, n 3 

= 2 -,Pi 

= Pi = 

P, = .05 




Value of 

Teat 

Type 1 




Value i 

of Bn 




On 

Oil - 1 

1 8 

28 

4.3 

7.1 

12.5 

25 

50 

250 

1.0 

s.p. 

.048 

164 

.299 

443 

599 

739 

.855 

922 

.984 


n.p. 

.048 

.112 

.192 

,297 

.441 

604 

765 

.870 

.972 

1 2 

a p. 

.067 

200 

.338 

476 

621 

751 

.860 

925 

.984 


n.p, 

067 

.149 

245 

.361 

.508 

662 

805 

895 

978 

1 6 

s p. 

102 

248 

379 

.603 

.632 

.750 

.855 

.921 

.983 


n.p 

102 

.210 

323 

447 

.592 

730 

849 

920 

.983 

2 0 

s.p. 

.127 

.271 

390 

,500 

.619 

736 

845 

915 

.981 


n.p. 

.127 

.260 

.370 

.497 

.636 

.764 

.870 

932 

.986 

2.6 

s p. 

146 

.278 

.382 

482 

596 

716 

.831 

907 

.975 


n.p 

.146 

.278 

.402 

628 

.664 

784 

.882 

938 

.987 

4 6 

s.p. 

148 

.233 

.300 

399 

.520 

.657 

796 

.887 

.976 


n.p. 

.148 

280 

.405 

531 

.666 

786 

883 

939 

.987 

7.0 

s.p. 

.117 

.182 

.265 

350 

.482 

632 

.781 

880 

.974 


n.P- 

.117 

.234 

362 

.478 

.620 

,751 

862 

.927 

985 

10 

s.p. 

.091 

.162 

.227 

327 

466 

621 

.776 

.877 

.974 


n p. 

.091 

.191 

.300 

422 

569 

712 

.838 

.913 

.982 

10 

s.p 

.067 

.130 

209 

313 

456 

615 

773 

.875 

.973 


n.p. 

,067 

.149 

,246 

361 

509 

662 

.805 

.895 

.978 

100 

s.p. 

,061 

117 

.200 

.307 

.452 

.613 

.771 

.875 

.973 


n.p. 

.061 

.118 

.201 

.308 

454 

615 

773 

875 

973 


Below the heavy line the s p teat is less powerful then the n p. test. 


“power” of the “sometimes pool” test is greater everywhere except near 

(021 , 032 ) = ( 1 , !)• 

3.2. Effect of P 2 , Pz illustrated. The influence of the probability levels em¬ 
ployed in the final phase of a “sometimes pool” test is illustrated in Figure 3, 
The main effect is observed to be the manner in which the behaviour is con¬ 
strained at the extreme values of ffu ■ 



548 


A., E. PAULE 



Fr<; 3 Cluaa A Tesla, n i = 20, n- «• -1, «, =» 2. 



Fia. 4. (a) Upper Diagram; Effect of Varying n a , Pi = P t — P 3 =» ,05 and Hi ** 20, 
ni = 2. (b) Lower Diagram; Effect of Varying m Pi = P 2 = P 3 m ,05 and n a = 4, rt 3 ■= 2. 

3.3. Effect of n 2 , illustrated. The response of the Type I error to increases 
in the degrees of freedom of the preliminary test is illustrated in Figure 4. The 
maximum disturbance is observed to increase as n a increases or as nj decreases. 





preliminary test 


549 


3.4. Class B test illustrated. The behaviour of the Type I error of some Class 
B tests is illustrated in Figure 1(b) The hypothesis is always rejected less 
frequently than if a “never pool” test at the P 2 level is used 




Fig 5, (a) Upper Diagram• Effect of Varying «z when Pi = 2Put, Ps = Pj = 05 and 
m = 20, 7 t 3 = 2. (b) Lower Diagram. Effect of Varying »i whenF, = 2F S0 , P 2 = P 3 = 05 
and n 2 = 4, 7ij = 2. 



Fig. 6 Effect of Varying th when P 2 > Ps Pa = .10, Pi = .05 and n 2 4, n, 2. 


3.5. Recommended procedure illustrated. Figure 5 illustrates the behaviour 
of the Type I error when the recommended procedure is applied to the special 
™ Figure 4. Whentii = 12, n, = 4, the 20% probability level is 



A. K. I*A I’I.Ij 




pi escribed and I hi- Type I error never exceeds .09. When 7ii = 20, n 2 = 20, the 
more liberal value of 0% is prescribed and the resulting Type I error never 
exceeds .07. The more liberal choice of I\ results in 11 greater gain of power, 
near 0 21 = 1, than would have resulted if the 20% level had been used throughout. 
A small loss in power occurs when 0 31 is huge Should the experimenter wish to 
guard against this loss in power for a larger range of values of 0 2 i near one, he 
may (lo so, at the expense of a somewhat larger disturbance in the Type I error, 
by choosing I\ larger than !\ . In the present example, if Pa is taken as .10 
instead of .05, Figure 0 shows that the Type 1 error is changed only slightly for 
values of 0 S1 near one, but the maximum disturbance is increased. Such a test, is 
uniformly more powerful than the “never pool” test for all values of 0 2l for which 
the Type I error is less than .10; a much larger range of values than in the 
previous case. 

IV. Derivations and Proofs 


4.1. Derivation of joint frequency function. The joint frequency function of 
the u’s is given by 


cM" 1-1 el" 1 ' 1 exp (- jh? + 

l L <n 


’hj'2 rh>h 

3 ~ -r 2 

<Ti (73 


where e t is independent of the ids. Transform to new variables: 

n a ej mV) n t v 1 

Mi »= - ■ ■ , Ui = - , W = —■■■-. 

ni Vi ni Vi ni 

By integrating and evaluating the constant, the joint frequency function of 
Hi and «j is obtained: 


I *U nl (n 2 Hn,) 


o\r e u 


1 ,I(»a+n 3)—1 
H 1 M» 


( 3 ) p - _____ „ . ___ 

B(^n2, ^Ri)B(^7ia, + M 2 )) (0 2 t032 + 032 Mi + itiKs)' ni ’ , " s+ni) 

where 02 i — <r\/a\ ; 0s 2 = <j\J a\ . 

4.2. Definition of critical region. The rule of procedure for the "sometimes 
pool” test may now be expressed in terms of the it’s. Reject the hypothesis 

032 — 1 if 


Jttl > U°1, 
1.U2 Si Ha, 


or 


I Ml < Ml, 

Ml Ma s 0 

^ Ma, 


[1 + Mi 


m°i = ~ -Fx<?h , «i; Pi), 

m° = - 3 -F 2 (n 3 , n 2 ; P 2 ), 

712 

0 n 3 _ 

= ' F,{ni ’ ni + ni > F ^- 


where 



PRELIMINARY TEST 


551 


. Thc rcad( f wil ) ™ te that the u ’ s ^6 ratios of sums of squares. The symbol«, 
is associated with the preliminary test The final test when pooling is not pre¬ 
scribed is associated with the symbol u 2 , and when pooling is prescribed the 
relevant statistic is MiW 2 /(1 + tq). 

Ihc critical region defined in this way is illustrated m the two dimensional 
sample space [*o , « 2 | of Figure 7(a). The critical regions of the “never pool” 
and thc “always pool” test are readily identified in this figure. The region of a 
“never pool” test at the l\ level is designated by A + B, + C, the area above 
the line u t = u 2 ; and the region of an “always pool” test at the P 3 level is 
designated by Bi + Pa + C -f D, the area above the curve iqiq = r“( 1 fi- u\). 
The critical region of the “sometimes pool” test, B 1 + B, + C, may be considered 
in two parts; the portion due t.o ‘pooling, B 1 -)- B 2 , and the portion due to not 
pooling , G. 




u ,-- u,-- 

Fio. 7. Critical Region of “Sometimes Pool” Test, (a) Left: Class A Test - u\ > fii (b) 
Right: Class B Test: u° < iR . 


The probability of rejecting the null hypothesis is given by 


(4) 


Q(02i, Oaa) 



p du x dll 2 + 


f f vduidut, 

•'u. 


where p is the frequency function (3), and w = 1 * 3(1 + Mi)/wi • 

Simple explicit expressions for these integrals cannot be obtained in general, 
but when n 3 = 2 they can be reduced to forms containing incomplete beta 
functions. This special case is dealt with in Subsection 4.7. 


4.3. Critical value of Pi. The symbol Mj in Figure 1 is used to denote the 
ui coordinate, of the point of intersection of the line u 2 = and the curve 
•WjUi = u?(l + «i). Accordingly, 


a value readily determined for any given test This relationship may be expressed 
in terms of the P's as 


( 6 ) 


F 1 = 


n 2 P 2 

Hi Fa 






A. K. I’Afl.r, 


where F i is defined by HjUi — n./’i . The probability level cniresponding to 
F i is denoted hv Th . 

The critical value f\ is the value of }\ which divides the possible "sometimes 
pool’’ tests into two types having different proper! ies If 1\ is less than Pi(,I'\ > F, 
or «i > ti|), the test is referred to us a C 'lass A test If l\ is greater than <F, 
or a” < lb), the test is referred to us a (’lass H lest. 

4.4. Lemma 1, 

LbMM\ 1. If On > On and Or. > On , and if the equality applies in one of these, 
then the ratio of the frequency function# (3) 

/y\ p(ni, j flu , 5 m) 

pU< 11 i <hi, M 

increase# monolomcally us (ij n t increases with u 2 fixed, or as (ii) Ms increases 
with iii fired, or as (iiil >h increases on fixed pooling curve i< t Uj = n 5 (l + ufj. 
Proof, The ratio (7) is a monotonie function of 

On 0 t 'i + On Mi + Mi Ui 

OnOn + 6»iUi + Mills 

lb is easily shown that an expression of the form (a bx)/{c + dr) increases 
monotonieally with respeet to x if a/c < h/d, and this condition holds for eases 
(i), (ii), and (iii). 

4.6. Lemma 2. 

Lemma 2. If area L lies above a given pooling curve, and to the right of a given 
preliminary line, if area K lies below the same pooling curve, and to the left of the 
same preliminary line, and if 

Pr{L\O n , M > Pr\K | 0 n ,0 )t }, 

then 

Pr[L\0n,0n) > Pr[K\ On , Of), 

where d'n > On and On > Or. and the equality applies in one of these. 

Proof. For any point (in , m } ) in K and any point («J, u[) in L, Lemma 1 

(iii) yields 

p{ui, V-i | On , flan) p(u [ , Uj 1 On , 0 » 3 ) 

p(«ii m 3 ! On, On) p(u], if | On, On) ’ 

where if = c(l + u[)/u [, and c is a constant defined by tin = c(l T «i)/ui. 
Since K is below a given pooling curve, if < if and 

p(lii, if j On, On) ^ p( Mi, Ui [ On i ^ 32 ) 

P(u'i, ll'i | B n , On) p(lf, if | On , On) ‘ 



PRELIMINARY TEST 


553 


Consider 


p(^i, O n, <4) <b K p(ui, i4 [ 0 2 i,0a 2 ) 
p(wi, lia | 02i, 632) p(u'x , u[ | 02i, 0 32 ) ’ 

whore 5 is a constant such that the inequalities hold for all (u,, uf) in P and 
all (iti, «*) in L. 

Integrating over the regions yields 


Pr{K | e' n , 08 2 i < b.Pr{K | 0 21 , 0 a2 } 
and 


But 


b.Pr{L | 02i , 032} < Pi-{L|«a, 0 a}. 


Pr{K | 621 , 0 32 j < Pr{L | 0 2l , 0 32 ), 

thus 

Pr[K | An , 032 } < Pr{L\ 6 n, 6 ’ w ), 
winch completes the proof. 


4.6. General Properties. 

Result 1. When 0 n — 1, the Type I error of a Class A test is less than P 3 . 
Proof. In the notation of Fig. 7(a), the probability of falling in Bi + B 2 -f 
C ■+■ D is Pa when 0 n = 1 and 0 32 = 1. The region of rejection of the “sometimes 
pool” test is smaller by D, 

Result 2. When Bn = 1, the Type I error of a Class A test is greater than 

(1 - Pi)P 3 . 

Proof. The statistics in and ■u 1 u 2 /(1 4- iq) are independent when = 1 and 
0a 2 = 1. Under these conditions, the probability of falling in Pi + P 2 , in the 
notation of Fig. 7(a), is equal to the product of two incomplete beta functions 
having the values (1 — Pi) and P 3 . Consequently, the Type I error is greater 
than (1 — Pi) . 

Result 3. The Type I error approaches P 2 as 0 2 1 approaches infinity. 

Proof. The distribution becomes singular when 0 22 = °°. The frequency 
function approaches zero uniformly for any finite value of tq and approaches 

1 u 2 " 3—1 

B(!n a ,*n 2 ) (1 + R 2 ) i( " 3+n ’ 5 

at iq = <=o. When 0 2i = », the entire mass is concentrated on the line «i = “ 
and is distributed as a beta variable along that line. In the notation of Fig. 
7(a), Pr{B x + Bf) -*■ 0 and Pr{C) F*. 

Result 4. If the Type I error of a Class A lest is Qo for 0 2 i, then for 0 2J > 0 22 , 
the Type I error is greater than r, where r is equal to the lesser of Qo and P 2 . 



554 


A. B. PAULI, 


Throe useful corollaries are associated with the above result: 

Result 4.1. If at 62 1 - 1, the value of the Type I error is less than P 2 , this is 
its minimum value for any <h t. 

Result 4.2. If at On — 1, the. Type I error is less than Pi , then as On increases 
from 1 the Type I error increases monotonically until Pi is reached. 

Result 4.3. If for some value, of On the. Type I error is equal to or greater than 
Pi, then for any larger value of 0 n , the Type I error is greater than I\ . 

Proof. Let the regions of Fig. 8 be, denoted by Hi — Ai -f Bi + C\ with 
similar designations for Ri and lit. Lc,t IU = P t + Ih 4- i?a + Ik + Ci + C\ , 

If r — Qo , let the non-pooling line between Hi and Hi in Fig. 8 correspond to 
Qo for all On . Then Pr[Ii 4 1 On , 1] ■= Prjfii | On , 1), whence Pr{B t + R a + 
B\ -(- C' 2 1 On , 1) = Pr (A L | On , lj. By Lemma 2, we have for any o' n > e n , 
Pr[Bi + Bi + Bt + Ci | 0' tu I) > Pr[Ai | On , 1) and Pr{R t \ o ' n , 1) > 

Pr{R\ | 0» , 1) = 



Fici. 8. Critical Regions for Result 4. 

If r = Pi , let the non-pooling lino at the lower boundary of R3 in Fig. 8 
correspond to Q 0 for all 0 n . Then, in the same way Pr (/i< \ On , 1 } = Pr[Ai + 
d.2 4- Aj -)- Ca | On, lj and Pr{Bi | ^21 > 1 ) > Pr{A\ + Aj + Aj | On , 1 ) by 
Lemma 2. Thus Pr[R K \0n, 1 ) > Pr[Rx + Ri + -'is + Bi | 02i, 1 ) and 
Pr{Ri [ On , 1 ) i> Pr{Ri -j- Pj | On , lj ~ Pi . 

Result 5. For a Class B test, the Type I error is less than Pi for all On . 

Proof.' Figure 7(b) illustrates the critical region of a Class B lest. We have 
Pr{ A + B + ( 7 i + Ci + C3} = Pi . But the region of rejection of the "sometimes 
pool” test is smaller, excluding A. 

Result 6. The Type I error of a Class B test, for On = 1, is greater than 
(1 - Pi)P,. 

Proof. Changing P x to Pi removes Ci from the region of rejection in Fig. 
7(b), thus decreasing the Type I error. The modified test lies in both Class B 
and Class A, so that Result 2 applies. 

Result 7 . For any Oh , the Type I error is a minimum for changes of Pi when 
Pi = Pi. 



PRELIMINARY TEST 


555 


Proof. For ft Class A test, changing Pj to P x removes region B 2 of Fig 7(a), 
thus decreasing the Type I error. For a Class B test, changing Pi to Pi removes 
region Cj of Fig. 7(b), similarly decreasing the Type I error. 

Result 8. A Class A test, m which the Type I error is less than or equal to P 2 , is 
more powerful than a “never pool” test hewing the same Type I error. 

Proof. In Fig t 8 , let region Pj = A\ + Pi + Ci be equal in size to P 4 = 
B x + Bi + P 3 + B< + Ci + Ci. Then Pr\Ri j & 2l , 1} = Pr{Ri \ 0 2 i, 1} and 
Pr{B 2 T P 3 d - Pi T C 2 1 On , 1] = Pr{Ai | On , 1) Increasing S 32 = 1 to 0 m and 
applying Lemma 2 yields Pr(P 4 1 On , o'm) > PrfPi | d n , 032 }. 

RFjBULT 9. For a fixed Type I error a Class A test, carried out at given levels of 
Pi and P 3 , is more powerful than a Class B test at the same levels. 

Proof. Fig. 7 and Lemma 2 apply at once 


4.7. Closed form expressions for ns = 0. The probability of rejecting the 
hypothesis in a ‘'sometimes pool” test is given by <2(0ai, # 32 ) = Qi + Qi where 
<3 X corresponds to the region P, and Q 2 to the region C of Fig. 7. 

The integrals (4) representing the probability of rejecting the null hypothesis, 
reduce, when n a = 2 , to 


( 8 ) 


Qi = 


1 + 

3 

Its 

8*2 

1 _L_ 


[ ^Zl^82j 


I ni 


h{\n 2 , §ni) 

1 + ? 


where the argument z of the incomplete beta function is defined by z x/{\ d~ x) 
where 


( 9 ) 


x = 


1 +$ 


1 + 


0 
u 3 


021 032, 


Under the null hypothesis 032 — 1; 

(10) Qi W 


i 1 9 

1 + u 3 


, «3 
021 


4 "i 


• P 


3 ) 


since 


1 

PS = (1 + U3) i<flJ+ " l> ’ 


Similarly 


Ie'(%Tll , \nf) 




A, B, PAT'LL 


where the argument i of the incomplete Mu fund inn is defined hy 2 ' = 1/(1+^) 




(13) ft-Wlni.W-ft, 

since 


i 

]>! “ a + u!) 1,r 

The incomplete beta function tow been tabulated by Pearson [4], 

The author wishes to thank Professor TV, 0, Cochran anil Professor John TV, 
Tukey for helpful advice in the preparation of this paper, 


(1) T, A, Bancroft, "On biases in estimation due l,u the use of preliminary tests of sig¬ 
nificance", Atmtla of Math, St at., Vol 15 tJtPM), pp. HH> 2111. 

[2| Frederick Mommsti. fl 0n pooling data", Jour. Am. Sid Ami., Vol, 43 (1948), 
pp. 231-242, 

(31 M, MnnitiNtiTON and C. M. TuoMfHON. "Tallies of pereentage points of the inverted 
beta (F) diatrilmtion". Biomtrik, Vol, 33 (1913), pp, 73*38. 

(41 Haul Pearson, Tablet of Ik Incomplete Bela Function, Cambridge University Press, 
1934, 



estimating the mean and variance of normal populations 

FROM SINGLY TRUNCATED AND DOUBLY TRUNCATED SAMPLES 1 

By A, C. Cohen, Je. 

The University of Georgia 

1. Summary. This paper is concerned with the problem of estimating the 
mean and variance of normal populations from singly and doubly truncated 
samples having known truncation points. Maximum likelihood estimating equa¬ 
tions are derived which, with the aid of standard tables of areas and ordinates 
of the normal frequency function, can be readily solved by simple iterative 
processes. Asymptotic variances and covariances of these est imat es are ob¬ 
tained from the information matrices. Numerical examples are given which 
illustrate the practical application of these results. In Sections 3 to 8 inclusive, 
the following eases of doubly truncated samples are considered: I, number of 
unmeasured observations unknown; II, number of unmeasured observations m 
each ‘tail’ known; and III 2 , total number of unmeasured observations known, 
but not the number in each 'tali’. In Section 9, singly truncated samples are 
treated as special cases of I and II above. 

2. Introduction. In practice, truncated samples arise with various types of 
experimental data in which recorded measurements are available over only a 
partial range of the variable Such samples are usually classified according to 
the form of the population (complete) distribution; according to whether the 
truncation points are known or unknown; and according to whether the number 
of unmeasured (missing) observations is known or unknown In this paper, the 
further classification of singly truncated or doubly truncated is made, accordingly 
as one or both 'tails’ of the sample have been lemoved. Pearson and Lee [1, 2], 
Fisher [3], ITald [4] 3 , and this writer [5] studied singly truncated normal samples 
with a known truncation point when the number of unmeasured observations is 
unknown. Stevens [6], Cochran [7], and Hald [4] studied similar samples with a 
known number of unmeasured observations, Stevens [6] also considered doubly 
truncated normal samples with known truncation points when the number of 
unmeasured observations in each ’Tail’ is known. In each of these papers, equa¬ 
tions were derived with which maximum likelihood estimates of the population 
mean and variance can be computed from samples of the type considered. 
With the exception of [5], which uses standard tables of the normal frequency 

1 Based on papers presented beforo the American Mathematical Society, Durham, 
North Carolina, April 2,1940, and before a joint meeting of the Institute of Mathematical 
Statistics and the Biometric Society, Chapel Hill, North Carolina, March 18, 1950 

1 The problem involved in this case was recently called to the writer s attention y 
Churchill Eisenhart 

J Reference [4] appeared while this paper was awaiting publication. Minor revisions have 
been made m view of Hftld’s results, 


557 



558 


A. c. rrotiKX, ju. 


function, practical application of the various estimating equations involves 
use of special tables which may frequently be unavailable. 


3. Case I. Number of unmeasured observations unknown. Let x' a designate 
the left truncation point, To -|- H the right (nmeation point, and hence R the sam¬ 
ple range. Let no be the number of measured observations with values equal to 
or between the truncation points. In this case, the number of unmeasured obser¬ 
vations is assumed to lie unknown. We translate the origin to the left terminus 
t by the change of variable x = x‘ — .r 0 , and designate the left and right truncation 
' points in standard units of the population (complete distribution) as £' and 
respectively. We can write the probability density function for this case as 


(1) 

where 

( 2 ) 

and 


t/ r \ _i_ ~M't 

- Uo - /'oW2r r 


0 < at < R, 




(3) fi — x'o — a?. 

Thus {la - I'o) is the area under tin; normal curve, between ordinates erected at 
£' and£" respectively. Moreover (l!i — /V) = P(x'o < x' < Xa + R). The likelihood 
function for such a sample is 

(4) Pfa *»o) = ((/' _ 7 '/vv'2t) 


Since R is the truncated range, and since and £" are in standard units, 
we have 


(5) 


i" = £' + R/<r. 


It should be understood that £' is considered throughout this paper, as the 
independent parameter of location. The mean, n, cf. (3), is a linear function of 
In the derivations which follow, we employ the Fisher I n functions, where 
2 o(£) is defined by (2) and 


( 6 ) 

and hence 



■In—l(0 dl t 


dl n 

df 


— I- 


n—1 * 


These functions satisfy the recurrence formula 


(7) (» + 1)7,+1 + kin - 7 n _! = 0, 


n > —X. 



TRUNCATED SAMPLES 


559 


U& is ordinarily abbreviated to l n in this paper. Where no confusion seems 
likely to occur, similar abbreviations are used for other functions of £, 

We now obtain certain relations for use in subsequent derivations. Equations 
(2), (5), and (6) enable us to write 


( 8 ) 


dJti 


- -1L, - -*(!'), - - P ( { ") 


da 


= -T" d £ 
1 da 


where ¥>(|) is the ordinate of thenormal frequency curve, i.e., <p(£) = —j= e H2/2 . 

V 2tt 

Ordinarily we abbreviate <?(£') to p and p(}") to p". On differentiating (5) 
we have 


(9) 

and hence from (8) 


d£' = _ n 

da a 2 

dl a it E 
~r: = <f> 


Taking logarithms of (4), differentiating with the aid of (8) and (9), and 
equating to aero, we obtain the maximum likelihood estimating equations 

dL 7lp(p' — p") 

3(' “ 

ai 

da 

If we define 


( 10 ) 


Jo - To 




/ nap" \ R 


no ,1 « 

+ 2 Zj 

a a i 


= 0, 
■K') 


o. 


(it) 


- 


l'a - la 


Z2 = 


la-la 


and substitute these values in (10), the estimating equations become 

a[Zi — Z% — £ ( ] — = 

( 12 ) 

<r 2 [l - ^'(Z 1 - Zi - £') - Z,R/a\ -*-2 = 0, 
where v x and h are the first and second sample moments referred to the left 
terminus; i.e., n = t x\/no ■ 

To obtain the required estimates a and £', it is necessary to solve the two 
equations of (12) simultaneously. As illustrated m Section 7, this can he accom¬ 
plished without too much difficulty with the aid of the normal curve tables by 
using a modified Newton-Raphson method for solving two equations in two 
unknowns. This method is described in greater detail by Whittaker and Robinson 
[8). Note that Z x and Z 2 , cf. (11), involve only the normal curve ordinates 
p' and p" and the areas la and l' B ' . Consequently they can be evaluated for any 



OHO 


A. O. milKN, Jit. 


desued values of f' and <r from standard tallies of (lie normal frequency function. 
To determine /u, substitute a an<l in (3). 

Throught this paper, we designate rim maximum likelihood estimates as 
g, or and t' respectively, whereas corresponding population parameters are 
designated as p, <r, and £' 


4. Case II, Number of unmeasured observations in each ‘tail’ known, f.et, 
the truncation points, the origin of reference, and the number of measured 
observations be designated as for Case I. If we let «i and >i 2 be the number of 
unmeasured observations in the left and right ‘tails’ respectively, (he likelihood 
function for a sample of this type is 


(13) P(* 


i. , 


i tio+fij. 


) = k( i - nr 




V 


no n o 

( i 


(/o')" 


where K is a constant. 

We take the logarithms of (13), differentiate with the help of (8) and (0), mid 
equate to zero to obtain the maximum likelihood estimating equations 


(14) 


Let 



(IB) 


Y _ n -‘ —if1— 
n„ (1 - I'o) • 


y. 


n 

n c v> 

r" » 
«u * o 


and (14) can be written as 


(16) 


<r[Ti - 7. - {') - vi = 0, 

41 - ro r i - y* - n - Ko/d/o-] - n = o, 


where n and v 2 are again the first and second sample moments referred to Lhe 
left terminus. The estimating equations (10) correspond to equations (12) 
given for Case I, and the manner of solution is the same for both cases. Yi and 
F 2 for a given sample are functions of and a only. They can be evaluated for 
any desired values of these variables from ordinary normal curve tables. As in 
Case I, the mean is estimated from (3). 


6. Case III. Total number of unmeasured observations known, but not the 
number in each tail. Again, lot the truncation points, the origin of reference, 
and the number of measured observations be designated as in the two previous 
cases. Let N be the total sample size and hence W — n 0 the combined number of 



TRUNCATED SAMPLES 


561 


unmeasured observations in both tails. In the notation of Case II, N - no ~ 
m + ftj ■ The likelihood function for a sample of this type is 

(17) P(x i, , • ■ • , x H ) » K(1 - I'o + l") N ~ n0 <T *?^'+^\ 

Taking logarithms of (17), differentiating with the assistance of (8) and (9) and 
equating to zero, we obtain the maximum likelihood estimating equations 



In this instance, let 


(19) Qx 




and (18) can be written as 

<r[Qi — Qz ~ ?'] — vi = 0, 

!2L ‘ 4i - r«i - Qj - n - QzRM - » - o. 

It will be recognized that equations (20) correspond to (12) and (16) for Cases 
I and II respectively. Since the manner of solving the estimating equations is 
identical in all three cases, it will not be discussed further here For any given 
sample, Qi and Qi are functions of £' and a- only, and they can be evaluated for 
any desired values of these arguments from standard normal curve tables. In 
this case also, the mean is estimated from equation (3). 


6. First approximations. 

Case r. In this case, the following relations will usually provide satisfactory 
first approximations for estimating a and £': 

(21) <Tl = Si , = ~ v\/s x , 


where si is the sample variance, i.e, s 2 = (vt - n). It should be remarked 
that the only penalty involved in beginning with a poor first approximation is 
to increase slightly the number of steps necessary before arriving at a satisfactory 

final approximation by the method of Section 7. .... ,, 

Case ii. Since « x and are known in this case, it is more expedient to read 
first approximations to £' and {" directly from standard tables of normal curve 

areas where we set 




562 


A. C. COHEN, JH. 


and 


(23) 


nt 

tli +- 7ia ■+ 7l'i 


t" _j_ r - 
= h ~ J ( " 0 


( 2 /2 


(It. 


With and £" determined from (22) and (23), we obtain a first approximation 
for estimating <r, from equation (5), which we now write as 

(24) a, - R/(g - $1). 


Case m, For a first approximation in this case, it will usually be satisfactory, 
in the absence of contrary information, to assume that the unmeasured observa¬ 
tions are divided equally between the two tails, and then proceed as in Case II. 


7. Numerical examples. As previously mentioned, a modified Newton- 
ftaphson method for solving two equations in two unknowns is satisfactory in 
each of the three cases considered, for solving the estimating equations to obtain 
<r and £' in practical applications. A random sample from a normal population 
with jn — 0, and — 1, selected from Mahalanobis’s tables [9] will serve to 
illustrate the solution in each case. 

Case i. For the sample selected, n 0 = 32; = 1.244625; v° - 2.105275; 

Xo = —1.000000; and R ~ 2.750000. The estimating equations to be solved 
simultaneously for £' and o are thus 

o\Zi - Zt - £') - 1.244625 = 0, 

<r 5 [l - S'CZi - Z» - ¥) - 2.750000 Zi/a\ - 2.105275 = 0. 

For first approximations, we employ (21) to obtain; oi = s* =0.75; and £i = 
— 1.244625/0.75 = —1.66. Beginning with these approximations, we subse¬ 
quently obtain the results displayed in Table 1. 


TABLE 1 

Solution of estimating equations m Case I 

tr £' from £' from Difference 

1.536313 -0.6389 -0.5387 -0.0002 

1,527778 -0,5455 -0.5460 +0.0006 


Interpolating in this table, we obtain a- = 1,534 and £' = —0.541. On substituting 
these values in (3) we obtain g = —0.170, Even though the first approximations 
in this instance proved to be considerably in error, no appreciable increase was 
experienced in the number of steps necessary to arrive at the final values given. 

Case ii. Solution of estimating equations (16) for this case can also be illus¬ 
trated with the same sample which was used in Case I. In this instance, however, 







TUtl.NT.VTED SAMPLES 


563 


we have the additional information; 7U= 7 and n % = 1. The equations to be 
solved are: 

~Yi~ {'] - 1.244625 = 0, 

<r'’(l “ {'O', - I'a - (') - 2.750000 F 2 /<r] - 2.105275 = 0. 

From (22), (23) and <21) we obtain the first approximations: = -0,935; 

-• 1.000; and lienee <r, « 0.050. beginning with these values, we proceed as 
in Case I, and after several trials obtain the results displayed in Table 2. 


TABLE 2 

Sul,limn of mlimoling equations in Case II 


{' Com >>, 

£ f from Vj 

Difference 

1.1) I llili" 

-0 9381 

-0.9360 

-0 0021 

1,IKKHK*I 

-0 0820 

-1.0094 

+0.0274 


Interpolating, we have <r -- 1.030 and £' = -0.941, From (3) we then obtain. 

/I = -0,022. 

Cask m. Again we use the same sample that was employed to illustrate 
Cases I and II. In this instance, however, we assume that the only information 
available abnuf the unmeasured observations is that their total number is 8. 
In the notation of Keel ion 5, wo have AT = 40, n o = 32, and hence N — tio = 8. 
The tvstimating equations in this situation are 

<r[Qi - Q, - {'] " 1-244625 = 0, 

cr 5 (l - rWi - Qt ~ $') “ 2.750000 Qi/<r] - 2 105275 = 0. 

Under the assumption that 4 unmeasured observations are m each 'tali’, equa¬ 
tions (22), (23) and (24) give first approximations: = -1.28; h = 1.28; 
and hence <r, » 1.07-1. Starling with these values and proceeding as in the two 
previous cast's, we obtain the results displayed in Table 3. 


TABLE 3 


Solution of estimatin g equations in Case III 
(' Irom ►, t' f ' om "> 


1 , (motion 
t.lIXKHH) 


-1.079-1 -1.2091 

-1.0118 -0.9730 


Difference 


+0 1297 
-0 0379 


By interpolation, we have cr 4=3 1-077 and ' = 1.027. From equation (3), 

we then compute £ - 0.10G. 


8. Precision of estimates. To determine asymptotic variances of * and',, ™ 
construct the variance-covariance matrices. This requires are 



564 


a. C. COHEN, JH. 


second partial derivatives of loEarithmn nf + 1 ™ n r , , 

the three cases considered. Results stated in (81 n V function in ea °h of 

derivatives. ated m (8) and W are evolved in these 

Case x. The second partial derivatives in this case are 


(25) tP . ra 0 /i(f',{"), 
where 


B l h 

dt'do 


2? f ft' ^ L no , , 

“ Mt , i ), ~~ n fJt' t»\. 

7 <5c- S 0-2 > * / ) 


/l(f/ » *") - “U + - *'% - (2, - jjy*], 

(26) /j( ^> «"> " {; Zitfo - Z s ) - {"] + [2, - Z, - 

= {(;) ^ + f") ~ [2 - m - Z a - f') - 




Subsequently we obtain 
(27) “*' 


-ft - 


r i 

L/i/. - /L 

» F(f) = - 
no 

1 : 

M K> 

l_ 


Case ix. In this case the second partial derivatives are 
(28) n 0 ?,({', f"), 

where 


r‘ * = 

V/i /s' 


spK-fftft'.n, |^-5„ ( |' 1 e"), 


f") = - I 1 + £% - f'T 2 + ~° K? + ?° v*] 

«i n 2 2 J’ 

] + IF. - F, - f'j}. 


(29) Gift', {") = l-Y. 


,-Yt- £" 

<r L^2 


Finally we can write 


(30) 7(a) = 


no 






Case hi. This time, the second partial derivatives 


V g, 


i( 7 j 


are 



truncated samples 


565 


where 


i +m - + 


no 

N - 


>h 


(32) 


h{ f', |") = - 

n - {; o> [(/J~) (a - « - S 


(Qi - Qtf 




+ [Qi — Qi — £'] 




- 2 - m - o* - n - e* - 


Accordingly we obtain 


(33 ) y(«-- 


-k. 


/13 hi 




= 


\A 1 ft 3 ' 


Note that "variances of the estimates for each case considered, can be computed 
for given values of £' and 0 - from standard normal tables of areas and ordinates 


9. Singly truncated samples. If only the left ‘tail’ is missing from the samples 
thus far considered, then £" = , nj = 0, <p" = 0, l' 0 ' = 0, and hence Z s , 7 2 , 

and Qi each equal zero. Upon substituting these values in (12), (16), and (20) 
respectively, estimating equations applicable to singly truncated samples are 
obtained as special cases of the estimating equations for doubly truncated 
samples. Of course Cases II and III become identical when samples are singly 
truncated When 7 2 = Qi = 0, then 7 a = Qi, cf. (15) and (19) 

Case i. With Z 2 = 0, the estimating equations (12) become 

<t [Zi — £'] = v\ , 

04 <r ! [l - £'(Zi - £')] - * • 


Eliminating <r between these two equations we have 


(35) 


vj. 

2 

v\ 


Zx - e \Zx - r 



which is recognized as the Pearsoxi-Lee-Fisher equation in a form which was 
previously given by the author [5]. 

Case n, With 7 2 = 0, the estimating equations (16) become 


(36) 


v [7i - £'] = pi 
<r 2 [l - £'(7i - f')l - P 2 . 
Eliminating cr between the above equations, we obtain 


(37) 



_L_/_!_ 

7! - r \Yx - t' 



l 



A. C. COHEN, JR. 


566 


which is in a form completely analogous to (35). Furthermore, this equation 
can be solved for jt' in the same manner as (35), cf. [5]. Since cr can be eliminated 
between estimating equations in singly truncated eases, but not in doubly 



truncated cases, the numerical computations are much simpler and less laborious 
for singly truncated samples. 

If the right rather than the left tail is missing from singly truncated samples, 



truncated samples 


567 


iplicable estimating equations can be obtained from (12) and (16) by translating 
L e origin to the terminus on the right and setting Z x and Y x equal to zero 
,ther than Zi and Y 2 . 



The variance formulas (25) and (28) likewise assume more simple forms with 
singly truncated samples. Substitute - 0 in (25) and the variance formulas 
applicable with singly truncated samples when the number of unmeasure 




r. < (»uk\, jis. 


observations, E tmkimwn, betimw idcnlwul in burn with fhn:w previously given 
by the writer m M! \\ lien the number ut mimea-mied obsei rations m a singlv 
truncated sample i- known, the applicable \:tri;uicc> formula') (28), on setting 
1', - 0, lieeome 

(38) m, - and 17*') » 1 m(l'), 

n n 

where 11' and ir may be le^.uded ;is weighting funetion.s defined by 

(v» irfe't - L'LLflfFlff_...__ 

J s j (2 - t’ 1 r, - $'tlil + \\0\n,, «1 + |')1 - [Yv - If 


„ 2 - | 7 (Fi - |Q_ 

1 J } id - s’th - i'll + r,(r, //,. m, + if - [r, - if 

Similarly, the cfirrehitiou between sampling errors of a and f'm this case becomes 

__ n-s' 

’ V{2 - I'O'i - |')|[1 4- ftOWaVd- If 

A comparison of the variances (38), with those applicable when the number of 
unmeasured observations is unknown, serves to indicate the extent to which 
information contained in a singly truncated sample is increased by adding 
knowledge of the number of unmeasured observations. To facilitate such com¬ 
parisons, W, tv, and corresponding functions IF and tv' applicable when the 
number of unmeasured observations is unknown, are displayed graphically in 
Figures 1 and 2. In computing the plotted values of IF and w, the ratio n/N 
in (39) and (40) was replaced by Jo. This ratio is, of course, an estimate of Jo, 
and for n and N sufficiently large, the substitution is amply justified. Equations 
for W' and w' can be found in [5]. For further comparisons, a graph of w" ap¬ 
plicable in determining the variance F(|*), where f* is estimated from n/N alone 
is also included in Figure 2. This latter function is defined as 

(42) tu"(**) = ~ h ) . 

r 

It follows from the well known formula for the variance of {*: 

_ i fa - j»)\ _ i fa - Jo)\ 


An examination of Figures 1 and 2 discloses that except when the omitted 
portion of the distribution is small (£' < —3), the variances of the estimates of 
<t and I' based on singly truncated normal samples are substantially less when 
the number of unmeasured observations is known than when this information 
is lacking. 



TRUNCATED SAMPLES 


569 


REFERENCES 

[1] K Pearson and A Lee, “On the generalized probable error in multiple normal cor- 

lelation 11 , Biometnka, Vol. 6 (1008), pp 59-68 

[2] A Lee, “Table of Gaussian ‘tail’ functions when the ‘tail’ is larger than the body”, 

Biomclnka, Vol 10 (1915), pp 208-215 

[31 R. A. Fisher, "Properties and applications of H h functions”, Mathematical Tables, 
Vol, 1, pp xxv'i-xxxv, British Association for the Advancement of Science, 1931 

[4] A. IIai»d, “Maximum likelihood estimation of the parameters of a normal distribution 

which is truncated at a known point”, Skandmavisk Akluanetidskrift, Vol. 32 
(1049), pp 119-134. 

[5] A C Cohen, Jr., “On estimating the mean and standard deviation of truncated normal 

distributions”, Jour Am Stat. Assn , Vol. 44 (1949), pp 518-525. 

[6] W. L Stevens, “The truncated normal distribution”, appendix to “The Calculation 

of the Time-Mortality Curve” by C I. Bliss, Annals of Applied Biology, Vol 24 
(1937), pp. 815-852. 

[7] W G. Cochran, “Use of IBM equipment m an investigation of the truncated normal 

problem”, Proc Research Forum , International Business Machines Corp , 1946, 
pp, 40-43. 

[8] E T. Whittaker and G Robinson, The Calculus of Observations, Second Ed , Blackie 

and Son, Ltd., London and Glasgow, 1929, pp, 88-91. 

[9] P. C Mahalanobis, “Tables of random samples from a normal population”, Sankhya, 

Vol 1 (1934), pp. 289-328. 



THE ASYMPTOTIC PROPERTIES OF ESTIMATES OF THE 
PARAMETERS OF A SINGLE EQUATION IN A COMPLETE 
SYSTEM OF STOCHASTIC EQUATIONS 1 - 2 

By T\ W. Anderson* and Herman Rubin 4 
Columbia University awl Institute for Advanced Study 

1. Summary. In a previous paper [2] the authors have given a method for 
estimating the coefficients of a single equation in a complete system of linear 
stochastic equations. In the present paper the consistency of the estimates and 
the asymptotic distributions of the estimates and the test criteria are studied 
under conditions more general than those used in the derivation of these estimates 
and criteria. The point estimates, which can be obtained as maximum likelihood 
estimates under certain assumptions including that of normality of disturbances, 
are consistent even if the disturbances are not normally distributed and (a) some 
predetermined variables arc neglected (Theorem 1) or (b) the single equation is 
in a non-linear system with certain properties (Theorem 2). 

Under certain general conditions (normality of the disturbances not being 
required) the estimates arc asymptotically normally distributed (Theorems 3 
and 4). The asymptotic covariance matrix is given for several cases. The criteria 
derived in [2] for testing the hypothesis of over-identification have, asymp¬ 
totically, x J -distributions (Theorem 5). The exact confidence regions developed 
in [2] for the case that all predetermined variables are exogenous (that is, that 
the difference equations are of zero order) are shown to he consistent and to hold 
asymptotically even when this assumption is not true (Theorem 6), 

2. Introduction. The complete system of linear stochastic equations con¬ 
sidered by the authors in [2] was written 

(2- 1 ) B vv y[ + IV~ t't , 

where y t is a row vector of G jointly dependent variables at “time” t, zt is a row 
vector of K variables predetermined at i, and e< is a row vector of “disturbances,” 
and B w and r„„ are matrices, If B„ u is non-singular the distribution of e t induces 
the distribution of y t given z t . 

One component equation of (2.1) was given special treatment. Let /3 be 

'This papor will bo included in Cowles Commission Tapers, Now Series, No. 30. 

l The results of this paper were presented to meetings of the Institute of Mathematical 
Statistics at Washington, D, C., April 12,1940 (Washington Chapter) and at Ithaca, New 
York, August 23,1946. Most of the research was done at the Cowles Commission for Re¬ 
search in Economics; the authors are indebted to the members of the Cowles Commission 
staff for many helpful discussions 

* Fellow of the John Simon Guggenheim Memorial Foundation; Research Consultant 
of the Cowles Commission for Research in Economics 

'National Research Fellow, Research Consultant of the Cowles Commission for Re¬ 
search m Economics 


570 



ASYMPTOTIC PROPERTIES OF CERTAIN ESTIMATES 


571 


composed of the. coefficients of the coordinates of y, which are not assumed 
zero in the specified equation, and let Xt be composed of the corresponding 
components of y t ; similarly let y be composed of the coefficients of the coordinates 
of Zt whicli are not assumed zero, and u t the corresponding components of z t ; 
and let ft be the component of et associated with the specified equation. Then 
the single equation is 

(2.2) fix't + yu[ - f t . 

Suppose we have a set of observations x,, z t , t = 1, ■ , T. For sets of any 

two vectors a t and b t , let the second-order moment matrix be 

(2.3) « ■£ £«:&!. 

i t-i 

Let St be some linear transform of Vt, the set of coordinates of z t not contained in 
Ut , chosen so M, u ~ 0. Defining 

(2.4) W„ = M„ - , 

and assuming t t normally distributed with mean 0, covariance matrix 2, and 
independently of e,i(t ^ t '), we find the maximum likelihood estimate of fi, 
to be proportional to a vector defined by 

(2.5) (M x ,M7tM„ - vW xx )b' = 0, 
taking v as the smallest root of 

(2.6) | M„M7}M. X - vW xx | - 0. 

The vector is normalized by 

(2.7) tfojl = 1, 

where & xx may be a function of the estimates of other parameters. The estimate 
of y is f [2; Theorem 1], These estimates were derived under the 

following explicit Assumptions A, B, C, and D: 

Assumption A. The selected structural equation (2.2) is one equation of a complete 
linear system of stochastic equations. It is identified by the fact that if H is the 
number of coordinates in Xt , there are at least H — 1 coordinates in w<, the vector of 
predetermined variables m the system, but missing in (2.2). 

Assumption B. At lime t all of the coordinates of z t = (u t , Vt) are given. 
Assumption C. The coordinates of zt are given functions of exogenous variables 
and of coordinates of i/ ( _i, y i-i, • • • .If coordinates of y«, y~ i, ■ ■ ■ are involved in 
Zt, they will be considered as given numbers The moment matrix M„ is non-singular 
with probability one. 

Assumption D. The disturbance vectors e t are distributed serially independently 
and normally with mean zero and covariance matrix 2 XX . 

Under these assumptions it is found that (1 -f- v) is the likelihood ratio 



572 


T. W. ANDERSON AND HERMAN ROBIN 


criterion for testing the hypothesis that the number of components of z, assumed 
to have zero coefficients is so great. 

If there arc no lagged endogenous variables in z t , we can find confidence 
regions for /S and for 0 and y simultaneously as well as an approximate test for 
the above hypothesis. The assumptions used for these results are A, B, and 
Assumption E. All the coordinates of z t - (u, , v,) are exogenous. The moment 
matrix M u is non-singular, The disturbances of the selected equation are distributed 
independently and normally with mean zero and variance a' 1 . 

Assumptions A and B are used in this paper and a number in addition, 
which will be lettered similarly. It. is to be emphasized that the various assump¬ 
tions aro used alternatively, never all at once; in fact many assumptions are 
mutually exclusive. 

3. Consistency of the estimates. The estimates $ and $ are consistent not 
only in the case for which they are maximum likelihood estimates, but also in 
cases ip which the disturbances are not normally or even identically distributed. 
Moreover, for consistency of the estimates it is not necessary that the investigator 
know all of the components of v t or use them, Another direction in which the 
assumptions may be relaxed is to permit the other equations in the system to be 
non-linear. 

"3.1. The linear case. This case is characterized by Assumption A. We need 
also to assume: 

Assumption F. M„ converges to a fixed non-singular limit R in probability. 

Let ui consist of the part of z< that enters the selected structural equation (22). 
The remainder of the components of z ( are divided into two groups os to whether 
they are known or not. Let c, be a linear transform of the known components 
not entering the specified equation such that 

(3.1) plim M v , = 0, 

i—*OQ 

and let r ( be a linear transform of the components of z, not known such that 

(3.2) plim M ut = 0, 

#—♦00 

(3.3) plim M„ = 0. 

i-*oo 

The relevant part of the “reduced form,” obtained from (2.1) by multiplication 
by Bj7v is 

(3-4) Xt — 4- H M C( + Etn-ri + St . 

The matrix (n„n„) is II*, (defined in [2]) multiplied on the right by a non¬ 
singular matrix; hence, pU zc = 0, and similarly /SfL, = y. We shall find it 
convenient to assume 
Assumption G. n*„ has rank H — 1. 

This means that for T sufficiently large the probability is arbitrarily near 1 
that (2.2) is identified. 



ASYMPTOTIC PROPERTIES OP CERTAIN ESTIMATES 


573 


However, these conditions still do not insure consistency. We need the asymp¬ 
totic; analogue of lack of correlation: 

Assumption H. 

1 T 

plim y= Yj «(Z t = 0. 

T'-toc 1 1 


We do not need to require that the covariance matrices of o t are the same or 
even that they exist. We shall make an assumption about 


(3.5) 


'Wl = M» 


(MxuM ie ) 



Mu A -1 

Mj 



Assumption I, The ratio of the largest to the smallest characteristic roots of W xx 
is bounded in probability. 

This means that for a suitable constant K 


(3.6) 


lim P 

J-»oo 




= 0, 


where P(E) denotes the probability of event E and s(A) and 1{A) are the smallest 
and largest roots of the matrix A, respectively 
Assumptions F and H imply that P xv —■> fiiu and P xc —*• n M in probability, 
where P IU = M n MZ\ t and P« is the part of 


(3.7) 


(M x »M xe ) 



MucV 1 

mJ 


corresponding to the vector 6 Ci . The first assertion follows because 
(II^Afuu + n xe M„ + n „Mru A- M, u )M;land M„ - 0, M n - 0, andAf „ -► 0 
in probability by (3.1), (3.3) and Assumption H, the second assertion follows 
similarly. Since matrix multiplication is continuous, and the characteristic roots 
of a matrix are continuous functions of the matrix, 

( 3 , 8 ) plim $[P t <,M„P'ic] = 0, 


where A/.. = (M„ - This follows fiom the well-known theorem 

(a proof of which is given in [4]) that if a random vector Xr converges sto¬ 
chastically to X, then /(Xr) converges stochastically tof(X) if M is continuous 

^Wo shall find the following lemmas convenient. The proofs are simple and 

ha LEMMA n i S '!b bl positive definite, A positive semi-definite. Then the smallest 
root v of | A - xB | = 0 is less than or equal to s(A)/s{B). 


5 See Section 4 of (2J. 

» Because of the assertion above and Assumptions 
of the matrix approaches zero in probability. 


F and G only one characteristic root 



574 


T. W. ANDERSON AND HERMAN RUHJN 


Lemma 2. Bach dement of a punitive definite matrix, is less in absolute value 
than the largest characteristic root. 

Let r be the smallest root of 

(3.9) | I\M„pL ~ vW' u ! - 0. 

Then plim vWl* = 0. This statement follows from (3.8) and Lemmas 1 and 2. 
Since 0 is a simple characteristic root of II„ plim il/,JI« , it follows from (3.9) 

T*-*w 

and the consistency of P xv and P« that $ approaches p apart from normalization. 
The following theorem results directly: 

'Theorem 1, Under Assumptions A, F, G, H, and I, and if plim pi ZI p' - 1. 

T-*tc 

(3.10) plim $ = P, 

T—♦ w 

(3.11) plim i — y, 

where $ andi arc calculated as if r, — 0 and as if the remainder of A, 11, C, and D 
were satisfied. 7 

3.2. The non-linear case. In this section we apply the estimates obtained in [2] 
to an equation of a complete system in which the remaining equations may be 
non-linear. We replace Assumption A by the following assumption: 

Assumxtion J. The, selected structural equation (2.2) is one equation of a complete 
system of stochastic equations: 

(3.11) F x [y { , z,) « e(l (i = 1, • ■ • , 0). 

Let us solve the complete system (3.11) for the components of y t . We obtain 

(3.12) y tj = hfizt, (,). 

Let W( be the subvector of z, occurring in the selected structural equation. 
Let c, be a vector function of z t such that plim M cu ~ 0. We may write (3.12) 

for those y's occurring in the selected structural equation as 

(3.13) Xt = TL xu Ui + n«c, + <p'(zt, ei), 

where the components of <p(zt , «<) are the residuals from the formal limiting 
regression of x t on u t and c ( . The proof of Theorem 1 can be used to prove the 
following: 

Theorem 2. If Assumptions F, G, II, I, and J are satisfied with z< replaced by 
( ut , c ( ) and 5< replaced by <p(z t , « t )> and r ( «= 0, and if plim pi ix p' - 1, then 

T-*oo 

(3.14) plim $ = P, 

(3.15) plim f = y. 


1 This follows from the above statements because 0 and y are (vector-valued) rational 
functions of M„ , Pi, , Wti and which approach limits in probability. 



ASYMPTOTIC PROPERTIES OP CERTAIN ESTIMATES 


575 


4. The asymptotic distribution of the estimates. 

4.1. The asymptotic distribution of P xs and P xu . To obtain the asymptotic 
distribution of the estimates we need stronger assumptions. Throughout Sections 
4.1 and 4 2 we use Assumptions A, B, F, H, I, and the following: 

Assumption K. The exogenous variables are bounded, the vector of disturbances 
of the complete system has mean zero, and is serially independent’, for some A > 0 
and some M, S(| St, | m ) < M\ the coordinates of z t may be linear combinations of 
lagged endogenous variables. If the endogenous part of a coordinate is 


00 0 

X X) gnVt- 

r=»l t*»l 


then 


and 


oo Q 

XX 

T-l t=l 


ffr. 


< CD 


23 dnUt—T,i 

jamt 1»1 

is bounded. 

Assumption L, The matrix <f> M is known and constant. 

Assumption M. For each i,j, Ic, l, 1 < i, j < H, 1 < k,l < K, 

1 T 

lim == Xj Fi(Si,5i,ztt,Zti) — Ki,ici 
r_,M 1 <-i 

exists. 

Let the components of M vv , M v , , be arranged as a vector m(T) with 
mean value p(T). Ithas been shown [3] that VT(m[T) - y(T)) is asymptotically 
distributed according to N( 0, 2), the normal distribution with mean 0 and 
covariance matrix 2 composed of elements 

cr„ = lim &(T[mi(T) - y,(T)] [,m,{T ) - a,(T)]). 

In conjunction with this result we make repeated use of a special case of Theorem 

6 tipple VT(x,r - €,r) O' - 1. ' ■ * . ») have the i omt ^ibulion 

N(0, T) with £, r being functions of T such that Jm !,*■ - f»• Let Ar(A , ■ * > 

dfkT / \ 

be random Borel-measurable functions of n real variables such that — = «*»«W 

exists with probability one for T sufficiently large and z in a fixed neighbor¬ 
hood of £, and suppose that there exist numbers a k , such that for any * > , 

and A > 0, P( sup , ! “ “»j I > <> <™ roaches * er °‘ *** « 

(*-fr>C* trl'iP • f"> _ , , t T ) the random variables 

Vf have the joint 'asymptotic distribution N(0, A*A% where A = 

(«•>)■ 



570 


T. W, ANDUiSflN IXn HI.UMAX Hl'llIX 


To obtain the asymptotic distributions wo have only to verify that the* assump¬ 
tions of thin statement are satisfied, and compute A, since the asymptotic 
distribution is characterized completely hy A'I'A'. We shall denote the clement in 
the &-th row and Mh column of AT A' by cf.A ,/ ( ). We shall find it convenient 
to urc the. notation df = Adx\ that is, the differential df is rlefined in terms of the 
limit matrix A. 


Let 


(4.1) 

A *» M iu, 

(4.2) 

B - 

(4.3)* 

V ** plim M V u , 

sc 

(4.4) 

E - plim M„ , 

7““»ao 

(4.5) 

L - Piu, 

(4.6) 

P = P„ - Mi.Mit , 

(4.7) 

a - ii„, 

(4.8) 

11 - TLj . 

The matrix) L is tho 
random function BM 

random function .U/7,1, + 11 J( M r .,.!/(;}, -r A of A, P is the 
+ II of B. Then 

(4.9) 

tlL « (dA)Cr' , 

(4.10) 

However 

dP - (dB)ET' . 

(4.11) 

a-(a,fc , a,i) - a,,w , 

(4.12) 

a(a,k, b,i ) = 0j,ki, 

(4.13) 

a(b,k , b,i) — 7 ,,*/, 


where aj/jti, jflqw, You are the appropriate quantities kow, respectively. From 
these we may compute <r(J tl , hi), cr{l u , p k i), and , p k i), the elements of the 
asymptotic covariance matrix of the elements of L and P (which are asymp¬ 
totically normally distributed by tho above). These elements ean be estimated 
consistently from the sample (the proof follows from Theorem 1). 

4.2, The asymptotic distribution of $ and i for constant normalization. In this 
section wo shall show that $ and i are asymptotically normally distributed 
(Theorem 3). In view of the above theorem on asymptotic distributions the 
intricate part of the proof is in obtaining the covariance matrix. First we shall 
demonstrate that the elements of vW are o(l/\/T) in probability. Since Assump¬ 
tion I holds, it is sufficient to show' that s(P ie M u P' ts ) is o(l /\/t) in probability. 
This means d | P z ,M,,Pt, | = 0, since each of the characteristic roots of 
P„M B 3 P X8 except the smallest approaches a non-zero limit in probability. 



ASYMPTOTIC PROPERTIES OP CERTAIN ESTIMATES 


577 


ior any matrix A, A,, denotes the matrix obtained by deleting the t-th row 
and J-th co umn from A, and A lki ,i is the matrix obtained by deleting the f-th 
and fc-th rows and the j-th and Z-th columns. Let 

~ (~iy + ’\A tj \, 

A"*- (-l)* +m+l+I U^ I, 

where e = 0 if (t - k) (j - Z) > 0, 1 otherwise when i k, j ?£ I, A' ,M = 0 
if j = k or j = l. In the rest of the paper we use the summation convention of 
tensor calculus for lower case indices; namely, that whenever a lower case letter 
appears as a superscript and a subscript m an expression, the corresponding 
terms are to be summed on that index. 

In general 


(4.14) d\A\ = A' 3 da tj . 

We may consider P x ,M„PL as a random function of P„. Then 

(4.15) d(i,j -th element of P X ,M.,P' XI ) = t \e kl dp) + . 

However 


(4.16) (n M Pn',)’ J = p’A = p y, 

where p { is a factor of proportionality. Since /3II ia = 0, we have d \ P X ,M„P' X , | = 0. 
Then it can bo shown that d(ft i ,M„ft(, - P x ,M, a P' z ,) ~ 0, where ft,,. = 
I _ P, 


Let 6 — n ia eri(, and F = P Z ,M„P', X . We know that ft, = where 
Pj — l/p J (and the capital letter J indicates that there is not to be a sum on 
that index), and 6 = . Hence 

(4.17) dft' = Pj dG' J + Q' J dpj . 


However ft'ftp,, = 1; therefore Pj = {Q' J Q kJ <p, k ) i Prom this it follows that 

(4.18) dp, = -(p,) 3 0’y t d0 w 
From (4.14) wo see dQ kJ = Q l,J ' aB dS a §. Therefore 

(4.19) dft i - PJ [Q <J ' af> - p'p l e kJ ' al> ni]dLB. 

Let us define = (3V.y • Let us multiply (4.19) by and . We obtain 
O^dft' = pjSy^'^dLp 

(4 ' 20) = PJ S Se«d6« - PjO^ddy? = -fi °dty a , 

(4.21) Mft' = 0. 

Let us simplify (4.20). We see that 

( 4 . 22 ) p a d§ ya = ^■/yBkidpl. 



t. iv. anukrson and hkrmax kubin 


Hence 

cQi a d^„ , ifdO-,*) « fi"/ Y Cjt(^/ Y fA,e'’ n e , Vwm. 

(4-23) „ . 

* dViryir,y aiin , = n-,,, 

say, Let tr()9‘, /5 J ) = </{', and lot Ch “ (<?P). Then from (4.20) and (4.23) we obtain 
(4.24) 0^0 -= R,, 

and (4.21) is 

(4.26) fQi “ 0. 

It may be shown (see [1], for example) that the solution is 

( 4 . 20 ) Qi m a - M). k {Q Lk r\RiMOnr\i - m*., 

where lc(l < fc < 17) is arbitrary except that p* ^ 0, and A.* denotes A with 
the fc-th column deleted, etc. If the normalization is 0' - 1 ,k = i is a convenient 
choice. 

Since ^ — —jlL, 

(4.27) di m -> -ctf'X? - d'd/7 . 

Hence 

(4.28) rtf’, D - -<r09 y , j3*)X? - «r(|3' 

(4.29) v(T, f) - *0', jWtf + rtf', ff)/Sty + <r(|9', l+ <r(l?, VW. 

We, therefore, see that we must compute <t 0\ T?)fi' and a-(IT, We find, 

from (4.20), (4.21), and (4.22) that 

(4.30) 17) = ~P'ii l Tr k ]C mp pijj, k = r7 T , 

say. Let (<r($ y , C)d') = Qi , and let R 2 = (rT 7 ). Then, from (4,30) and (4.21) we 
obtain 


(4.31) 0 & .= Rt , 

(4.32) = 0. 

The solution is 

(4.33) <&-(/- ■ 

We find, readily, that 

(4.34) dV<KC, *7) = ^"VV/w - tf", 

say, where (c mi> ) = CT 1 . Let Q 3 = (g'""). This concludes the proof of Theorem 3. 

Theorem 3. If Assumptions A, B, F, H, I, K, L, and M are saisfied, \/ T 0 —0) 
and \AT{y — 7 ) are asymptotically jointly normally distributed with means zero 
and covariance matrix 

(4.35) c0', 



ASYMPTOTIC PROPERTIES OP CERTAIN ESTIMATES 


579 


(4.36) = -QtfU-Q*, 

(4.37) °'(t > y) = TlxvQiTixu T n slt Q 2 T Qt flm + Qs, 

where Q i is giuen by (4.26), Q 2 hy (4.33), and Q 8 5?/ (4.34). 

If there is a kind of asymptotic independence of f e and 2 ,, then the above 
expressions may be simplified. Corollary 1 results from Theorem 3 and the 
following assumption: 

1 T 

Assumption N. lim — 22 S(fiZt Zt) = o 2 R, whereR is defined m Assumption F. 

T-vw A- I*** 1 

Corollary 1. If Assumptions A, B, F, H, I, K, L, M, and N are satisfied, 
VT0 ~ d) and VT(y — y) are asymptotically jointly normally distributed with 
means zero and covariance matrix 

(4.38) a$', fo = <r*(7 - fi'fi) k {Q kk )~\l - mi , 

(4.39) c($', y) = -c\l - p'+). k {Q kk r\fi xu + m k . , 

(4.40) &($'> = c 2 [(n* u + TV).t(0a) *(5,,, + f'y)k + C -1 ]. 

4.3. Asymptotic distribution of the estimates of the parameters j3 and 7 with 
normalization a function of U zx . 

If we relax Assumption L that A xx is constant, we obtain a more general 
result. Since the proof, however, is more involved, we shall not give it here; 
the reader is referred to [1], In the derivation of the estimates was defined as 
&($(5/). In the asymptotic theory we do not assume that this is the same for 
each t. We use the following assumption: 

Assumption 0. lim — 22 — n„ki exists', 

r_*oo I t—1 


lim m 12 <S(<5i.<5 1 ,) = exists-, 

T—*00 i i —1 

l T 

lim ■=, 22 S>{SuS t ,8tkSti) = <h l} ki + UijUki exists. 

Let 5„fei be the quantities n llk i corresponding to the ids, e„*j, the quantities 
corresponding to the c's. Define 


(4.41) 


(4.42) 

nk l tj 

Ui = P ir T X «<>*» 7 

(4.43) 

g 4 ' = (/ — /?V) fc(©w) V*)*' 

(4.44) 

t; fc I ~ 

g 6 = X X u,,*! , 

(4.45) 

£6 — X M • 


With the aid of the matrices Qi,Qi, and Q 3 , the vectors g 4 


and qt , and the 



580 


T. W. ANDERSON AND HERMAN RUI1IN 


scalar q<, , wc may express tin* asymptotic covariance matrix of the estimates. 
Wo obtain 

Theorem 4. If Assumptions A, H, F, II. I, K, M, and 0 arc satisfied, and 
is a f unction of , \JT(d — ft) and y/T($ - y) are asymptotically jointly 
normally distributed with means zero and covariance matrix 

(4.40) <r0\ ft) =* Qi 4- q't 3 + d'f/, + qi0'0, 

(4.47) 1 ?($',-?) « - QJh„ -I- q\y ~ flljiUz,. + qtfi'y - Q 2 — @'q 0 , 

c(f', 7 ) “ ni.iOiTr,.,, — fli.,747 — + Qiy'y 

(4.48) 

+ IIiuQs + QhIIjcu ~ y'q« ~ Qty + Qa, 

ro/icre Qi, Q z ,Qi,qi,q&, and q t arc given by (4.2(5), (4.33), (4.34), (4.43), (4,44), 
and (4.45) respectively, 

Ooroi.TjAkv 2. If Assumptions A, B, I), F, II and K arc satisfied, and 
ttx* = fl« , s/fift — 0) and \/T($ — y) arc asymptotically jointly normally 
distributed with means zero and covariance, matrix 

(4.49) c t 0 ', j§) - (/ - «)-*(/ - + tfl'jS, 

(4.50) <r(4', *)■-(/- /3V) *(0* k )"‘(frx« + + ^'7, 

(4.51) <r(f, •?) - (ft™ + + f 7 )*. + CT l + h'y. 

5. Asymptotic distribution of the likelihood ratio criterion and the small 
sample criterion for testing a certain hypothesis. The likelihood ratio criterion 
for testing the hypothesis that the number of coordinates of z t with zero co¬ 
efficients in the selected structural equation is as great as it is assumed to be is 
(1 + i/)~* r [2, Theorem 2], where v is the smallest root of 

(5.1) | P»M.,P» - vW„ | - 0. 

Then 

(5.2) T,-T - (Vrfip.) (VT$r,.y. 

From Theorem 5 of [4] it follows that the asymptotic distribution of Tv is the 

E 

same as that of the quadratic form x x\ where x has the limiting distribution 
of ■%/ T$Px,, use being made of plim - a 1 . Wo have 

(5.3) dx' =* 0 ! dp) + dftir) , 

LetT = (I - /3V).fc(e**) -1 (/ - t'0) k .. Then 

(5-4) d& ! = -vVt”<W dpi . 

Substituting in (5.3), we obtain 

^•5) dx' = 0 s dp) — i/*0V*e mB dpjir ), 



ASYMPTOTIC PROPERTIES OF CERTAIN ESTIMATES 


581 


Then 

(5 6) n ‘0> : \ -T") = <rV° - TTfcTT^ 15 ) = 

say, ancl (£'") = E 

Let F he- chohOn so E = FF' and F'SF = ^ is diagonal. Since EaE^E = EaE, 
the diagonal elements of T are 1 and 0. The number of elements that are 1 is 
the rank of Es.E, namely, D — II + 1 , wheie D is the number of coordinates 
of Vi (the number of coordinates whose coefficients m the selected equation are 

assumed to lie zero). Let 2 = ~xF. Then the asymptotic distribution of Tv 

(7 

is the distribution of 22 ' where 2 is normally distributed with mean zero and 
covariance matrix T. It is the x 2 -distribution with D - H + 1 degrees of freedom 
We observe that T log (1 + v) and TD\ are asymptotically equal to Tv, where X 
is the criterion based on small sample theory [2, Theorem 4], Finally, we note 
that v is independent of the normalization of ft. 

Theorem 5. If Assumptions A, B, F, H, I, K, M, and N are satisfied, —2 times 
the logarithm of the likelihood ratio criterion, —T/2 log (1 + v), the asymptotically 
equivalent Tv and TD times the small sample criterion , X, for testing the hypothesis 
that the, number of coordinates ivith zero coefficients is D are asymptotically distributed 
as x 2 with D — II + 1 degrees of freedom. 

This theorem indicates how conservative the small sample test is asymp¬ 
totically, for that test asymptotically is equivalent to using Tv as having an 
asymptotic x 2 -distribution with D degrees of freedom. 

6 . Asymptotic behavior of confidence regions based on small sample theory. 
In [2] we deduced confidence regions for /3 and for /3 and 7 when Assumption E 
holds. If the normalization of /3 is 


(G.l) = 1, 

where ■!>„ is a given matrix, then a confidence region (a) for 8 of confidence c 
consists of all satisfying (G.l) and 


( 6 . 2 ) 


t3*M»M7t M sx p*' s 
3 *' 



F d,t-k(<) , 


where Fd,t-k(T) is chosen so the probability of ( 6 . 2 ) for fi* = (3 is e and K is 
the number of coordinates of z< and D is the number of coordinates of 1 n A 
region (b) for f) and 7 simultaneously consists of 8* and 7 * satisfying ( 6 . 1 ) and 


(6.3) 


+ y*M ux 0*' + y*M m y*' 

P*Wzxl3*' 

< yzji 


We shall now show that even if Assumption E does not hold the regions have 
asymptotically confidence coefficients e and they are consistent under general 
conditions. 



T, TV. ANDKHSOK WS'D HERMAN TUIMN 


582 


Let r « i ?, <• - . We observe from Hectinn 4 that if 

Assumptions A, U, f, II, K, L, M and X are satisfied, the vectors \ZTc and 
yT’e have asymptotic independent distribution* A’(l), and A r ((), air), 
respectively Then TcM v / f and 7VAL/V will have asymptotic, independent 
X l «diatrihutions with F( - R- I)) and 1) decrees of freedom, respectively, 
Also approaches a stochastically. By Theorems f» and 0 of [4], the left- 
hand sides of (0.2) and {(1,8) have asymptotic /''-distriimtions with /) and 2' - K 
degrees of freedom and A' and T - K degrees of freedom, respectively, 

We shall prove that (a) is consistent for d; the proof is similar for (b) as a 
regionfor/Jandy, If we replace d by b in the definition of p } cM„c,'^ bMuMuM,J)'. 
For b ^ d we must show that the probability that h will fall in the confidence 
region ford approaches zero. The above form approaches bUjUll'j)' in proba¬ 
bility. If b & d and satisfies (0.1) then ML ^ 0 and cM„c' has a non-zero limit 
in probability since A is positive definite. Thus b is not in the limiting confidence 
region, 

Theorem 0, If Assumptions A, B, F, II, I, K, M, and N an satisfied, ik 
confidence, regions of Theorem 3 of |2] [including (a) and (b) above) arc consistent, 
and the regions (a) and (b) have asymptotically the confidence levels (. 


REFERENCES 

(1) T, W. Anderson' and Herman Robin, "Katimation of the parameters of a single eto- 
clmstic difference equation in a complete system/’ Cowles Commission for Research 
in Economies, 1047, dittoed. 

12] T. W. Anderson and Herman Rodin, "Estimation of the parameters of a single equa¬ 
tion in a complete system of stochastic equations," Annah of Math, Slat , Vo). 20 
(1049), pp, 46-03, 

[8] H, Rubin, "Consistency and asymptotic normality in stable linear stochastic difference 
systems/’ to be published. 

[4] H, Rubin, "Topological properties of measures on topological spaces/' Duke Math 
Journ ., to be published. 



SOME NONPARAMETRIC TESTS OF WHETHER THE LARGEST 
OBSERVATIONS OF A SET ARE TOO LARGE 
OR TOO SMALL 

By John E. Walsh 
The Rand Corporation 

1. Summary. Let us consider a large number n of observations which are statis¬ 
tically independent and drawn from continuous symmetrical populations. This 
paper presents some nonparametric tests of whether the r largest observations 
of the set arc too large to be consistent with the hypothesis that these populations 
have a common median value. Tests of whether the r largest observations are 
too small to be consistent with this hypothesis are also considered. Here r is a 
given integer which is independent of n. 

Subject to some weak restrictions, it is shown that the significance level of a 
test of the type presented tends to a value a as n increases. For no admissible 
value of n, however, does the significance level of this test exceed 2a. If whether 
the largest observations are too large is considered, tests with values of a suitable 
for significance levels can be obtained for r > 4. Values of a suitable for sig¬ 
nificance levels can be obtained for any value of r if whether the largest observa¬ 
tions are too small is investigated (n large). 

Properties of the power functions of these tests are considered for the special 
case in which the r largest observations are from populations with common 
median 0, the remaining observations are from populations with common 
median <t>, and each population has the property that the distribution of the 
quantity 

(sample value) — (population median) 

is independent of the value of the population median. For tests of 6 > <j>, the 
power function tends to zero as 9 — 4> —► — « and to unity as 6 — For 

tests of > 6, the power function tends to unity as 6 — </>—*•— <* and to zero 
as 6 — <j> —* °°. 

Analogous tests of whether the smallest observations of a set are too small or 
too large can lie obtained from the tests of the largest observations by symmetry 
considerations. 

If there is strong reason to believe that the set of observations is a random 
sample from a continuous population, the tests presented in this paper can be 
used to decide whether the population is symmetrical. Tests of this natuie are 
sensitive to symmetry in the tails of the population but not to symmetry in the 
central part. 

2. Introduction and statement of tests. The tests derived in this paper are 

applicable to situations of the following two types: 

(a). It is known that the observations are independent and from continuous 

583 



5K4 


JOHN K. VAUSII 


.symnn-iiit.il pi-puli'mii*m*, each pii|mtuii»*n lias a continuous edf/*'(,il 
such tluit /-’t i — <p) 1 - /'(C) • . 11 , where 4> is the population median) 

It i- desired to test whether tin* largest few observations arc too large 
(or (ihi Miiiilli t» lu* consistent with (he assumption that the populations 
have a common median value (if the ■'*()'',point of a continuous sym¬ 
metrical population is not unique, the median of this population in de¬ 
fined to he the midpoint of the inteival of .*()(/ points). 

(h). It is known that the observations are independent and from continuous 
populations w ith a common median value* 'e.g., I he* observations may 
he a sample from a continuous population). It. is desired to test, whether 
these populations are symmetrical (with emphasis on the tails of the 
population). 

With respect to ta), perhaps the meist common piactie*ul application is that 
where the observations tire assumed to lie a sample* from a continuous sym¬ 
metrical population of some special type* (e.g., normal) hut the ■values of the 
largest few observations make, this assumption questiemahlev. 'Hie lumpurumetric 
tests presented for (a) are easily applicej ami a significant result for a non- 
paramelric test automatically implies that the* observations are not a sample 
from the specified type of population. Furthermore, if a parametric test of this 
situation (i.e., a test based on the assumption eif a sample fremi this special type 
of population) is significant, the nemparametrie* tests are useful in determining 
whether it is possible that the observations might he u sample from a continuous 
symmetrical population of some other type. 

With respect to (l>), perhaps the most common application is that where the 
set of observations can lie considered to be a sample from a continuous population 
and it is desired lo test whether this population is symmetrical in the tails. 

Now let us consider the forms of the tests. Let .v(l), * * • , x(n) represent the 
values of the n observations arranged in increasing order of magnitude. Then 
x(n + 1 — r), ,v(n + 2 — r), • • * , x{n) are the r largest observations of the 
set. For situations of type (a), the tests of whether the r largest observations 
are too large are of the form 

Test 1. Accept that the r largest observations are too large to be consistent with 
the hypothesis that the populations have, a common median if 

min [,t(h + 1 - if) + x(J k ); 1 < k < ,s < r] > 2 x(U'«), 
where the, i’ s, j’s and n are integers such that 

i, — r, i„ < i u + k , j„ < Jvh , ji 11 a n H- 1 ■— 
a is defined by 

a = Prjmin [x(a + 1 — if) -f x(jv.)] > 20 J 0 = common median}, 
and W a = W„(n) is the smallest integer satisfying the relation 
(1) Pr[x(lF a ) < 4> | <)> = common median] < a. 



SOME NONPARAMETRIC TESTS 


585 


In testing the hypothesis of Test 1, the principle followed is to choose 
x(n + 1 — r) and some subset of x{n + 2 — r), • • ■ , xin) for use m the test. 
The integer s represents the total number of order statistics selected from 
x(n + 1 — r), • • • , i(n). 

The value of a = a (4 ji, • • • , ],) is independent of n and is given 

by equation (4) in Section 3. Table 1 contains some values of the i’ s, j’s and s 
which yield values of a suitable for significance levels. For Test 1, values of a 
suitable for significance levels can be obtained for r > 4. 


TABLE 1 

Some values of a for s < 5 



If the n independent observations satisfy the additional conditions 

(i) . Asymptotically (»-><»), *(TF«) is statistically independent of mm 

[x(n + 1 — 4) + x(jk) i 1 <k<s], u jn, 

(ii) . The standard deviations of x(F 0 ) and mm [x(n +1 - »*) + 

(A) 1 <k <«] exist for all n > ». + i. - 1 and the limiting ratio (»- -) 

of these standard deviations is either zero or infinite. 

(iii) . Let the notation ,(*) denote the standard deviation of * ™en,ifth 

populations have a common median 0, asymptotically the cdf s of 
















JOHN K. WALSH 


am 


tJ6(H f .) - ^),VU(ir„)J and (min [ifn. + 1 - 4) + - r ( Jk )] -2 4 }/ 
o-jmin (,r(n -f- 1 ~ 4) -f- •£(/*)] I are continuous at the point zero, 
then the significance level of Test 1 approaches the value a as 7 i tends to infinity. 

Although conditions (A) may appear to he, complicated, they are not very 
restrictive. These, conditions are satisfied if the n observations are a sample 
from a continuous population of the type usually encountered in practical 
situations (i.o., approximated in practical situations). Perhaps the most well 
known type of continuous symmetrical population for which a sample does not 
satisfy conditions (A) is that with a triangular probability density function. 
Part (ii) of conditions (A) is not satisfied for a sample from a population of 
this type. 

For large a, relation (1) with the equality sign is approximately satisfied if 
]V„ = pi + J A"„V n, (i e., the largest integer contained in pi + $K„\/n). 
Here K a is the standardized normal deviate exceeded with probability a. This 
value for ir„ was obtained from the normal approximation to the binomial 
theorem and furnishes a reasonably accurate solution of ( 1 ) with the equality 
sign for n > 10 , (see [ 1 ]). 

As an example of a test of type 1, let r --= f>, s - 2, j\ = 1, j\ = 2, 4 = 4, 
i< ~ 5. Then a * .0547 and the lest is (approximately) 

Test 2. Accept the, specified alternative of Test 1 if 

min [x(n — 3) + j( 1), x(n - 4) + x(2)] > 2r(pi + iK.mrVn). 

That this is a test of whether the 5 largest observations are too large is intuitively 
evident from the fact that a significant result will he obtained only if both 

x(n — 3) > 2x(pi + %K.omV n) - x(l), 

x(n — 4) > 2a;(|n - 1 - \K.mr\/n) — x(2). 

If the smallest two of the five largest observations are too large, it seems reason¬ 
able to suppose that all of the five are too large A similar interpretation ovists 
for all tests of the type of Test 1. 

The type (a) tests of whether the largest observations are too small are of 
the form 

Test 3. Accept that the r largest observations are loo small to be consistent with 
the hypothesis that the popxdalions have a common median value if 

max [x(n + 1 - j k ) + s(4); 1 < k < s < r] < 2 .r(/i +- 1 - W a ), 

where j, - r,j v < 4 +i, 4 < 4 + 1 , 4 < n + 1 — W„ < n + 1 — r, and both a 

and W a are defined in Test 1 . 

From the results for Test I and symmetry considerations, the significance 

level of test 3 tends to a. as n —> °o if conditions (A) are satisfied; it does not 

exceed 2 a for any admissible value of n. For Test 3, values of a suitable for 
significance levels can be obtained for all values of r (n sufficiently large). 

As indicated by ( 2 ), the tests of whether the largest observations are too large 



SOME NON PARAMETRIC TESTS 


587 


can also be interpreted as tests of whether the smallest observations are too 
large. Similarly the tests of whether the largest observations are too small can 
also be interpreted as tests of whether the smallest observations are too small. 

I he above discussion presents intuitive reasons for believing that Tests 1 and 3 
arc suitable for the situations to which they are applied. To obtain a semi- 
quantitative measure of the suitability of these tests, this paper investigates 
the special case in which the r largest observations are from continuous sym¬ 
metrical populations with common median 0 , the remaining observations are 
from continuous symmetrical populations with common median cj>, and each 
population has the property that the distribution of x - f is independent of xp, 
where x is an observation from the population and \p is the median of the popula¬ 
tion. The power function of a test of type 1 or 3 is defined to be the probability 
that the test is significant given the value of 6 - <j>. It is found that the power 
functions of these tests have several desirable properties: For Test 1, the power 
function tends to zero as 0 — — os, is a monotonically increasing function 

of 0 ~ <)> for 6 — < 0, and tends to unity as 0 — <£—♦». For Test 3, the 

power function tends to zero as 0 — <p —* ® ; is monotonically decreasing for 
0 — tp < 0, and tends to unity as 6 — <p —> — ». 

For testing whether the populations are symmetrical in the tails given that 
they are continuous and have a common median, i.e , situation (b), a combination 
of 1 and 3 is used. The resulting test is 
Test 4. Accept that the populations are not symmetrical m the tails ij either 

min [x(n ■+• 1 — h) -f x(jk ); 1 < k < s] > 2 x(W a ) 


or 


max [x(n + 1 — jk) + x(i k ) ; 1 < k < s] < 2 a:(n + 1 — TF«), 

where a < i u < »«+i, jo < jv+i , jw < 4 , /. < W a < n + 1 — e,, and both a 
and W a are defined in Test 1, 

Since both inequalities in Test 4 can not be satisfied simultaneously, the 
significance level of Test 4 tends to 2a as n —> ® if conditions (A) are satisfied; 
it never exceeds 4a for any admissible value of n. 

The asymptotic distribution (n —> «) of x(W a ) is usually not very sensitive 
to symmetry of the populations. For example, if the n observations are a sample 
from a population with a probability density function f(x) such that (/(<#>) s* 0 , 
(<t, =. population 50% point), and f(x) exists and is continuous m a neighborhood 
of x « 0 , it can be shown that the only property of f(x) which influences the 
asymptotic distribution of x(W a ) is the value of /(</>) • Thus, since a type 1 test 
investigates both whether the largest observations are too large and whether 
tho smallest observations are too large (to be consistent with the assumption of 
symmetry), while a type 3 test investigates both whether the largest observations 
are too small and whether the smallest observations are too small, Test 4 should 
be suitable for testing whether a population has symmetrical tails 



m 


JOHN K. WAESH 


3. Theorems and derivations. The fundamental fact uml in this paper is 
that, if the observations are from continuous symmetrical populations with 
common median <j>, the value of 

a « Pr jmin f.r(n + 1 - f*) + r(j, ; ) ; 1 < k < s] > 20] 

“ Prfmax (x(n + I - jk) + x(i'*); 1 < fc < 5 ] < 20] 

is independent of n for the values of a permitted in the tests, This result is a 
special case of the following theorem 

Theorem 1, Consider a set of n indeperulcnt observations from continuous 
symmetrical populations with commonmedian <t>. Ldit < ■■■ < i, and ji < ■■■ <j, 
be fixed sets of integers whose values are independent of n. Then the value of 

Pr (/3th largest of [x(n + 1 — jk) + x(i*); 1 < k < s] < 24 j 


is the same for all values of n which are >i, + j, — h In particular 

o—W 

a = l 


(3) 


where 


m(l ) w(8) m(5)—fcj 

1 + m(l) •+* 23 [m (1) — hil + 23 23 Ml) — hi — hf\ + 


miu) m(u—D— 

+ 23 23 


»H1—*)■— ■—lu_t 

23 tm(l) 

*i~i 


hi 




w = i,+ j, - 1 , u = j, — 1 , m(j, + v t — 1 ) - i, + j, — it — ft — v t + 1 , 

t = 0, 1, • * ■ , « - 1, 1 <> Vi < jtn - ft, to “ jo ~ 1 - 0. 

Proof. It is sufficient to prove the theorem for the expression 

Pr {max [a:(n + 1 — j k ) + x(t'*); 1 < k < a] < 20], 

since any probability expression of the form Pr{/3th largest of [ ] < 24) 
can be expressed as a specified constant plus a sum of probabilities of the form 
Pr {max [ ] < 2 <t>) multiplied by specified constants, where in each case the 
terms in the [ ] area subset of thes terms: x(n + 1 — jk) + s(t*), (1 < /c < s). 
Let the integer n have the value no, Then it can be verified that 

Pr (max [:r(no + 1 - j k ) + *(»*); 1 < k < s] < 20) 

(4) = Pr[max (2a;(nfl — j,) t x\no + 1 — 1+] + a;[no + 1 — W — m(W)}', 

1 ^ W < j,) < 24 

where 

m(ji + Vi — 1) = no + 2 — », — j t — v t , m(J>) - n 0 — i. — j> > l, 

< = 0, 1, • ■ • , s - 1, 1 < v< < ji+i — jt , io = jo ~ 1 = 0, 

by the use of Theorem 4 of [2], By the proof of Theorem 5 of [2], the value 
of the second term in (4) equals 



SOME NON PARAMETRIC TESTS 


589 


l>r[mu\ ( 2 .r(«„ - j,' I, .(/i 0 + 2 - IK] + x[a„ + 1 - IK - m{W)}, 

l<W<j. + l} < 20 ] 

ifwif,/a ~b 1 ) — 1 and the qxprcssion is based on n 0 + 1 rather than tiq observations 
(the values of the m s are the same as in (4)). The value of this expression, 

however, can lie shown to equal the value of 

M max {2r(n„ + 1 - j t ), .r[n 0 + 2 - W) + x[n D + 2 - W - m(TF)]; 

1 <W< j,} < 20], 

which by (4) equals the value of 

2 V{max (.rfao + 2 - j k ) + x(n)] 1 < k < s] < 20 ) 

if n = n n -b 1 for this expression, Thus, by induction, the value of 

2 V j max [,k(?i + 1 - j k ) + x(h); 1 < k < s] < 20 ) 

is (ho same for all sample sizes n > i s -j- J> . An analysis similar to that used 
in the proof of Theorem 5 of [2] shows that this also holds for n — i, + , 7 , — 1. 
liquation (3) was obtained by taking n = u> = i, + j, — 1, the m’s as given by 
(4) with this value of v, and substituting into Theorem 4 of [2] 

Another basic result is that, if the observations are from continuous symmetri¬ 
cal populations with common median 0 , the value of 

Pr (min [.r(n 4- 1 - h) + x(j k ), 1 < k < s] > 2x(W a )\ 

~ Prfmax [,r(n + 1 - j*) + x(i k ); 1 < 1c < s] < 2x{n + 1 - ?„)) 

is always less than or equal to 2a. This is a particular application of the theorem 
Theorem 2. Consider n independent observations from continuous symmetrical 
populations loilh common median 0 , Then, for any integer W, 

Pr (max [x(n + 1 — jk) + x(4); 1 < k < s] < 2 x(TF)) 

< Pr (max [x(n + 1 — Jk) + x(4)J < 20 ] + Pr[x(W) > 0 ) 

— Prjmax [x(n + 1 — Jk) + x(4)] < 20, x(W) > 0 ), 

Phoof. 

2V(max ( ) < 2x(W r )] = Prfmax [ ] < 20, x(W) > 0) 

+ Pr (max [ ] < 20 , x(TK) < <t>, max [ ] < 2x(W)\ 

-f Pr (max [ ] > 20, x(W) > 0, max [ ] < 2x{W)) 

< Prjmax ( ] < 20, x{W) > 0} + Prjmax [ ] < 20, x(W) < 0) 

+ Pr (max [ ] > 20 , x(W) > 4>} 

= Prjmax [ ] < 20} + Pr{x(W) > <t>\ ~ Prjmax [] < 20, x(W) > 0 ). 



590 


JOHN K. WAI,M I 


If the n independent observations satisfy conditions (A) in addition to being 
from ooniinuous symmetrical populations with a common median value, the 
|ignlfic,anee level of Tests I and 3 tends to a as n «. This follows from sym¬ 
metry considerations and 

Theorem 3. Consider n independent observations winch satisfy conditions (A) 
and are from continuous symmetrical imputations with a common median value 
Then 

litn Pr|min [x(7i -f- 1 - i k ) 1- x{j k )\ 1 < k < s) > 2j(H r 0 )) - a, 

Proof. Let 

Y - min [.r(/i + 1 - it) + /(jO; 1 < k < s] 
and consider the case where 

t 

litn <rU(tr.))/<r(y> - 0. 

* 

Since the populations arc continuous, o-(Y) > 0 and 

Pr[Y > 2*(lf n )) - I>r\Y -20 > 2:r(H'„) - 20] 

- M[Y - W*{Y) > 2[x(ir.) - 0] MY)}. 

Let 

Z « 2(x(IF«) - <p\/a(Y). 

Then, from (i) of conditions (A), 

Pr[Y > 2x(W a )] ~ f ” Pr[[Y - 20]/<r(K) > a) rtf’, (a) + 0(n), 
where F, is the cdf of Z and lim /3(a) = 0. 

n —»oO 

Let b be any positive number. From lim c r(Z) - 0, (ii) of conditions (A), and 

n-*oa 

the definition of xflFa), the mean of Z exists for all values of n and tends to 
zero as n —■> <». Then, by Tchebychcff’s Inequality, it can be shown that 

dP,{a) - 1 ~ y{n), 

where lim y(n') = 0. 

n-»£jQ 

From (iii) of conditions (A) 

lim Pr{[7 - 20]/<r(F) > -6) » lim/V([7 - 20]/x(F) > t>] + 5 (b), 

n-tw n~*oo 

where lim 8(b) = 0. 

6-.0 

Using the above relations, letting n —> first and then b —> 0, it follows from 

Theorem 1 that 

lim Pr[Y > 2x(W a )] = Pr{[Y - 20]/<r(7) > 0} = a, 

n-»oa 




some no npaha.metric tests 


591 


A, similar t^pc pioof shows that this limiting relation also holds when 

lim <t[x(W a )]/<r(Y) = «. 

n~»o5 

Finally consider properties of the power functions of Tests 1 and 3 for the 
special situation outlined in sections 1 and 2. The properties stated in the pre¬ 
ceding two sections follow from 

Theorem 4. I,et.c{n + 1 — r), • ■ , x(n) be from continuous symmetrical popula¬ 
tions with common median 0, the remaining order statistics from continuous symmet¬ 
rical populations with common median <f>, and each population have the property 
that the distribution of jc — ^ is independent of f, where x is an observation from 
the population and \p is the median of the population. Also let 

PM « Pr f min [x(ri + 1 - 4) + x(j k ); 1 < k < s < r] 

> 2x(Wf) | 6 — tj> = 4>), 

where the. conditions for Test 1 arc satisfied, and 

PM ~ f’rfnmx [x(n + 1 - jk) + *(4); 1 < k < s < r] 

< 2 x(n + 1 - Wf) | 0 - <t> - 4-j, 
where, the conditions for Test 3 are satisfied. Then 

lim PM = 0, lim PM = 1, 

iji —4 —c(J d 5 —<00 

lim PM ~ 1) lim PM = 0, 

4' —»— oo 4* —♦■oo 

PM) is a monoionically increasing function of $ for $ < 0, and P 3 ($) is a mono- 
tonically decreasing function of 4> for 'I 5 < 0 
Proof, It is sufficient to prove this theorem for the power function of Test 3. 
The results for Pi(<I>) can be obtained from symmetry considerations and obvious 
modifications of the proof for PM). 

First consider PM) for the case where < 0. Let a new set of observations 
bo formed from the given set by subtracting the median value of the corre¬ 
sponding population from each observation. Let y( 1), • • ■ , y{n) be the values 
of the sot of modified observations arranged in increasing order of magnitude. 
Since 4> < 0 , 0 < 4> and 

fs(0 - <t>, 1 <t<n - r, 

V ® ~ x(t) - 9, n-r+l<t<n. 

Thus 

PM) = Pr\m ax ly(n + 1 - jk) + 2/(4); 1 < k < s < r] 

— 2 y{n + 1 — W «) < — 4}, 



592 


JOHN B. WALSH 


whence it follows that Pj($) is a monotomcally decreasing function of $ for 
$ <! 0 and that lim P a(3>) = 1. 

Now consider the caae where 4> > 0. Again form the set of modifier! observa¬ 
tions and let 2 /( 1 ), • • ■ , y(n) be the values of these observations arranged in 
increasing order of magnitude. Then it is easily Been that 

Pi(4) < Pr[y(l) - y(n) < -J*] 

so that lim P 3 ($) =* 0. 


REFERENCES 

[1] Paul G Hoel, Introduction to Mathematical Statistics, John Wiley and Soiib, 1947, 
p. 45. 

[21 John E. Walsh, “Some significance tests (or the median which are valid under very 
general conditions,” Annais of Math . Slat., Vol. 20 (1949), pp. 64-81. 



ON A MEASURE OF DEPENDENCE BETWEEN 
TWO RANDOM VARIABLES 

By Nils Blomqvist 

University of Stockholm and Boston University 

1. Summary. The properties of a measure of dependence q' between two 
random variables are studied. It is shown (Sections 3-5) that q' under fairly 
general conditions has an asymptotically normal distribution and provides 
approximate confidence limits for the population analogue of q' A test of inde¬ 
pendence based on q 1 is non-parametric (Section 6), and its asymptotic efficiency 
in the normal case is about 41% (Section 7). The g'-distnbution in the case of 
independence is tabulated for sample sizes up to 50. 

2. Introduction and definitions. In drawing conclusions from statistical data 
it frequently happens that it is unnecessary to utilize all the information given 
by the data. In such cases it seems desirable to use methods which are 

1) valid under rather weak assumptions regarding the distribution of the 
population and 

2) easy to deal with in practice. 

Naturally such methods should always be used, but their applicability is, in 
most cases, limited by their small efficiency. 

Concerning methods of measuring correlation and testing independence some 
Bo-called rank correlation coefficients have been defined [2, 3, 4, 6] which have 
the first property. In large samples these are, however, rather tiresome to calcu¬ 
late, and a simpler method might then be preferable The coefficient studied 
here has in most cases both properties mentioned above and can be used when¬ 
ever its efficiency is not too small. 

Let (Xi , y{) • ■ • (x„ , y n ) be a sample from a two-dimensional population with 
cdf F(x, y ), and consider the two sample medians x and y. The cdf F(x, y) is 
assumed to have continuous marginal cdf's Ffx) and Ft(y) in order that the 
probability of obtaining two equal z-values or two equal y-values in the sample 
will be zero. Let the x, j/-plane be divided into four regions by the lines x = x 
and y ~ jj. It is then clear that some information about the correlation between 
x and y can be obtained from the number of sample points, say ih , belonging 
to the first or third quadrants compared with the number, say nt , belonging 
to the second or fourth quadrants. 

Before going further we shall explain what is meant here by belong to. If 
the sample size n is an even number the calculation of ni and ni is evident. If, 
however, n is an odd number one or two sample points must fall on the lines 
x - x and y = y. In the first case this sample point shall not be counted. In 
the other case one point falls on each of the lines. Then one of the points shall 
be said to belong to the quadrant touched by both points, while the other shall 

593 



594 


NILS BLOMQVIST 


not bo counted. It is easy to verify that both n ( and rij by tins method will be 
even numbers. 

As a measure of correlation we define 


(1) 


, Hi — Hj 2?li 

I ES t35S 

Hi + M1 «l + Mj 


(- 1 <q'< 1). 


The definition of t/ is not new [5] but ns far as is known, its statistical proper¬ 
ties have never been studied completely. 


3, The asymptotic distribution. It is known [l] that the median in a sample 
from a one-dimensional distribution under certain conditions is a consistent 
estimate of the population median and asymptotically normally distributed. 
Although it seems possible to weaken the requirements in our ease,, we shall not 
do so. We require that 

a) the population medians are uniquely defined (and assumed to equal zero), 

b) the marginal distributions of F(x, y) admit density functions /i(.r) and 

My)- 

c) /i(x), Mu) and their first derivatives are continuous in some neighbourhood 
of the origin and 

d) /i(0) and/a(0) are ?*0, 

In order to avoid trivial complications we shall assume here that the sample 
sizon = 2/c -f 1. 

Now define for every arbitrarily chosen point (x, y) 
a(x, y) = P{{ > x,y> y), 

^ Hx.y) - PU < x,7, > y), 

c{», y) = P{f < x, ij < y), 

d(x, y) = P(£ > x, y < y\, 


where the measure P refers to the cdf F(x, y) and evidently 

a + 5 + c + d = l. 

As the number of sample points belonging to the first and third quadrants 
around (2, §) must be equal, the probability of the combined event 

{fti “ 2r; x«(x, x + dx), $t(y, y -\- dy) | 
is 

(3) M2r;*,v) = - (no)' ■M*’' -S, 

where 


T 

(4) S = - • d t a - dy a — 
a 



* dxh " dy 5 



dx c ■ dy c — 


k — r 


dxd-dyd + dF. 


d 



MEASURE of dependence ggg 

Each of the first four terms of the excretion fa) < 

sample points determine (x, v ) and the 1 ,,tt. _ f , to » ease m which two 

is determined by only one point From tat t f it ^ m & ease m w ^’ c ^ (®> S') 
obtaining a, at most m,uaUoS fa ( ’ f °“™ S ‘ h “ «“ P^^baity of 

® P\nt<,m , fft v ^x,y). 

oo r«Q 

If we introduce the joint odf %(*, y) * , Dd j, (5) can be wrlttell 


( 6 ) 


as 


pt^ < 2R\ = rr ^( 2 . ff )i 

*'-00^—00 T 


X] p*(2r; a:, y) 


Z Pt(2r, x, y) 


k 

d'l'kix, y) ~ 2 Pi(2r; x, y ). 


Clearly the integrand in (6) is <1 everywhere it exists. In the points (x, y ) 
wheic tile denominator is equal to zero the integrand is undefined, but as the 
measure (!') of the set of such points is zero, we need not have any trouble 
with them. 

Under the conditions a)-d) x and § converge in probability to zero; that is 


Jim y) 

Jt-*oo 


1 for {x > 0, y > 0j, 
0 otherwise. 


Thus, when k and R tend to infinity such that y —> const, (6) becomes 

K 


Pt(2r;0, 0) 

(7) lim P(n t < 2R} = lun - - . 

Z Pt(2r, 0, 0) 

r-0 

According to (3) 

(8) p*(2r; 0, 0) = • (<W r ■ (Wo)*~ r -So, 

where the subscripts indicate the value at the point (0, 0). Because of (2), 
c 0 = do, do = &o and do d - bo = |> 
and the two parts of (8) are for large k 


(2k + 1)1 *a 2r »fr 2( *~ r ^ _ — — te —(Xr'-2ka Q )iHka Q ba) 

rl 2 *(fc — r)! 2 0 0 2iraof}o'\/2'rrfc 



596 


NILS BLOMQVIST 


and 

5. 


f 3a\ /t?a\ _ / db\ f £b\ _j_ /3c\ /3c\ /ckA /3<A 

\dx/ n \di//o \dx/t\dyJo \dx/s\Bi/Jo \dxj o\<hj )q_ 


dx dy. 


The first of these expressions follows from the usual application of Stirling’s 
approximation formula and we omit all details here. 

Hence, after the introduction of 

r 2k an + l\/ 2kajbc , 

R >= 2/cao + T-\/2kad>o , 


the expression (7) is transformed to 


(9) 


lim P 




4fca<i 


■y/ 8fajo5o 


< ij 


1 

y/2r 


f T r»** dt. 


From (9) it follows that ni is asymptotically normally distributed with mean 
ikao and standard deviation \/8kaobo . Thus 


is as ymptotic ally normally distributed with mean 4 oo — 1 and atandard deviation 

2 %/aoCl - 2 aoT//c. 


4. Properties aB an eBtlmator. Suppose we measure the correlation between 
x and y by 

(10) q~2[ffdF+ [ [* dP 

]_ ao J_ oo •'0 Jf) 


_ 1 b* 4 ao _ 


where, as before, ( 0 , 0 ) are the coordinates of the population medians. Then q 
has the desired property of being equal to zero in the case of independence and 
equal to ±1 in the case of linear relationship between x and y. 

According to (9) q' is a consistent estimate of q when the conditions a)-d) are 
fulfilled. Furthermore, as the standard deviation of q' is, to a first approximation, 
independent of quantities other than 5 , it is possible to construct approximate 
confidence limits for q for large sample sizes. This is done in the following way. 
In terms of n and q we have, according to the last paragraph of section 3 and 
( 10 ), 

Eg' ~ g, 

»(*■> ~ yTEi\ 

Let 4>(i) be a standardized normal cdf and Xi and X 2 two numbers such that 



MEASURE OP DEPENDENCE 


597 


$(Xi) "• *(Xi) “ 1 ~ a. According to (9) we then have 
( 11 ) P {Xx < ■ y/n < X 2 j ~ 1 — a, 

which gives the desired result. 

If -we let Xi * — Xi = X and solve the inequality in (11) for q, the following 
symmetrical confidence interval is obtained 

g' “ ~ Vx 5 " + n(l - q n ) <q<q' + ^VX 2 + 
where we have used that X s « n. 


5. The normal case. If * and y are normally distributed with correlation 
coefficient p, we have 


( 12 ) 


2 

q = - arcsm p. 


This expression is the same as the mean of Esscher-Zendall’s rank correlation 
coefficient r [2, 4], Hence, in the normal case g' and r estimate the same quantity, 
The coefficient g' has, however, a much smaller efficiency. The asymptotic 
efficiency of q 1 relative to the afore mentioned coefficient is 


tV) 

cr’(g') 



4 

9 


for p * 0. 

6 . Tests of independence based on q 1 . In testing independence between x 
and y it is in practice more convenient to use critical regions based on m instead 
of q'. Since, under the null hypothesis, the measure of a critical region is inde¬ 
pendent of F(x, y) (Fi(x) and F a (j/) are assumed to be continuous), any test 
based on n v is non-parametric. We have made exact calculations of the g'-distribu- 
tion for sample sizes n up to 50. For larger sample sizes the normal approximation 
for ni does not seem to entail errors of practical importance. 

To derive the exact distribution of m under the null hypothesis we suppose 
that n equals 2k. The probability that any k sample points shall have smaller 
aj-valuee than the other k points is 

err 


Hence, since any arrangement of the sample points according to their rvalues 
does not affect the distribution of the ^-values, 

AY 


P{ni *2r] - £4. 

(?) 


( 13 ) 



598 


NILH HUWQVIST 


If ii — 2k + 1 i( ia easily verified that the probability i 13) remains unchanged, 
if we use the procedure in calculating m ami n 2 proposed in Section 2. This is, 
in fact, the main reason for the proposal. 


TMt ,:j e;i«, ~k \ ± r] 


2* ! 

4 

8 

ii 

[ft 

M 

24 

30 

M 

36 

40 

u 

■18 

a 

1 wo 

UK*) 

i m\ 

i mw 

i ms 

i m\ 

i dm 

i.wm 

1 UJKJ 

1.000 

1.000 

1 000 

2 

333 

.ISO 

m; 

.019 

.655 

•M 

.790 

.724 

*740 

.732 

.704 

.773 

4 


.019 

am 

153 

.179 

220 

257 

,2VJ 

314 

313 

.305 

.387 

6 



mzi 

•oio 

.m 

.039 

057 

.07ft 

W4 

M3 

131 

148 

8 ! 




.(kttts 

.0011 

cm3 

007(1 

.012 

(IIS 

.026 

.031 

.013 

in 






0001 

.0004 

.fWlI 1 

0032 

.0034 

.0060 

,0087 

13 









.0003 

.DOCK 

.0007 

0012 

« 1 











.0001 

.0001 














Ik 

¥ 

6 

10 

u 

at 

22 

2ft 

.id 

34 


42 

45 

50 

1 

1 900 

1.000 

1.000 

l noo 

i ooo 

1.0CKI 

1.000 

l.ooo 

t.oon 

1 000 

1.000 

1.000 

3 

,100 

205 

2 HO 

347 

305 

,414 

4'/. 

.494 

.517 

534 

655 

.572 

6 


0079 

030 

037 

.own 

.115 

Ill 

1W 

.104 

.217 

.MS 

.258 

7 



.0000 

.0034 

.0000 

017 

.027 

.m 

.050 

.053 

.070 

049 

9 





bow 

.0013 

0024 

.0053 

.fKWfl 

013 

.017 

.023 

11 







.own 

0001 

0000 

.1)017 

.(KWH 

.0012 

13 









.0001 

.(1001 

.0003 

.0005 

18 














Ik ia the larnoit even nutntar contomod in the (tampls itto 


The distribution of n L is symmetric about n t - lc with the variance 



Thus, in testing independence we can for large sample sizes use 


ih — k 

Vk 


• V'2k - 1 


as an approximately normally distributed random variable with mean zero 
and unit s.d. 


7. The asymptotic efficiency of the g'-test. In the case that x and y are nor¬ 
mally distributed with the correlation coefficient p, it is possible, bub rather 
tedious, to calculate the power function of the g'-test, Wo will, therefore, restrict 
ourselves to considering only the asymptotic behavior of the power function. 

Consider tests of independence (p = 0) against one-sided alternatives p > 0. 
Let Lm ! (p) be the power function of the g'-test for the sample size m and Z/ l „ 5> (p) 
be the power function of the test based on the correlation coefficient r in a 
sample of size n We assume that all tests have the same size, i,e. 

(14) £i l, (0) = Li 8) (0) = a 



MEASURE OF DEPENDENCE 


599 


for all m and n. We shall say that the g'-test has the asymptotic efficiency e if 



/3L (1 >\ 

(15) 

lim ' dp 
(dL w \ 


\ dp 

when 

n 


m = -. 
e 


This means that the sample size in using the r-test need only be 100e% of 
that in using the g'-test, in order to get the same derivative of the power functions 
at p = 0 (for large sample sizes). Since the definition of e only concerns the 
behavior in the neighborhood of p = 0, it might perhaps be more correct to call e 
the asymptotic local efficiency 

In order to calculate t we define two sequences {q m \ and {r„} such that 
W > and {r > ?■„) are tests with the afore mentioned properties. According 
to ( C J) and (10) q' is asymptotically normally distributed with mean q and s.d. 
-y/(i — q 2 ) /m. Furthermore, r is asymptotically normally distributed with mean 
p and s.d. (1 — p 2 )/V n. Hence, 


1 


L<,%) = PW < 1m\p} 


Q* ~ g 

Vl - S 



1 - Ln\p) = P{r < u ! p) ~ $ 




) 


from which it follows 


(16) 


(3i). ~ (I). Vm • 

\ dp /o 


According to (14) we have 

lim q m '-\/m = lim r n "Vn = $ l { 1 - «)• 

m—*oo n 


Thus we conclude 



/dq\ 


Clearly (17) is equal to 1 if 


n 


2 



600 


„NIUN H1.0MQVIHT 


Hence, according to (12) and (15) 



In other words, the asymptotic efficiency of the g'-teat is about 41%. 

8 . Concluding remarks. An interesting similarity exists between the g'-test 
of independence and a test of equal location parameters in two distributions, 
constructed in the following way. Suppose that two samples of equal size, say k, 
are drawn independently from two distributions. Compute the number of 
individuals, say r, in the first sample, falling short of the median of the pooled 
samples. Then the distribution of 2r under the null hypothesis is the same as 
that of ni in the g'-test for sample size 2 k (or 2k -f- 1). The teat based on r was 
discussed by F. Hosteller [7]. 

Another similarity is between the g'-test and a special ease or the exact test of 
independence in a 2 x 2 table (8). If in such a table the marginals happen to be cut 
at the 50% points the two test procedures become identical. 

REFERENCED 

[11 II. Cium^r, Mathematical Methods of Statistics, Princeton University 1’ross, 1946. 

121 F. Esbchbh, "On a method of determining correlation from the ranks of a variate”, 
SkandinauM Akluarietidskrift, Vol. 7 (19241, p. 201, 

[3] W. HoErruiNG, "A non-parametric test of independence", Annals of Math. Stat., 
Vol. 19 (1948), p. 546 

14) M. G. Kendaeo, "A new meesute of rank correlation", IHometrika, Vol. 30 (1938), p.81. 
16] F. Mosteu,bb, "On some useful 'inefficient' statistics", Annals uf !Uulh. Slat., Vol. 17 
(1946), p. 377. 

16] C. Spearman, "The proof and measurement of association between two things", Am. 
Jour, of Psych., Vol. 15 (1904), p. 88. 

[7] F. Mosteller, "On some useful 'inefficient' statistics", unpublished thesis, Princeton 

University, 1946. 

[8] R. A. Fibheb, Statistical Methods for Research Workers, 8th Ed, Stechert & Co., 1941. 



SOME TWO SAMPLE TESTS 

By Douglas G. Chapman 1 
University of Washington 

1 . Introduction, and summary. Stein [4] has exhibited a double sampling pro¬ 
cedure to test hypotheses concerning the mean of normal variables with power 
independent of the unknown variances. This procedure is here adapted to test 
hypotheses concerning the ratio of means of two normal populations, also with 
power independent of the unknown variances. The use of a two sample procedure 
in a regression problem is also considered. 

Let (X,,) (i = 1, 2) (j = 1, 2, 3, ■■•) be independent random variables 
distributed according to AT(m<, <r t ): all parameters are assumed to be unknown. 

Defining k by the equation 

(1) mi = km 2 

we wish to test the hypothesis Ii that k has a specified value ko 

If h — 1 the hypothesis H reduces to a classical problem, often referred to 
in the literature as the Behrens-Disher-problem (cf. Seheff6 [3] for a bibliography). 
At the present time it is still an open question whether it is possible (or desirable) 
to find a non-trivial single sample test for H with the size of the critical region 
independent of trx and <r 2 . In any case it is a simple extension of the result of 
Dantzig [1] (cf. also Stein [4]) to show that no non-trivial single sample test 
exists whose power is independent of cri and <r 2 . 

On the other hand the case fc 0 Z 1 may be expected to occur frequently in 
fields of application where a choice must be made between different products, 
methods of experimentation etc. which involve different costs. The statistician 
must make a choice on the basis of results relative to the ratio of costs involved. 
Nevertheless this problem appears to have received little attention in the 
literature. 

In general tests based on a two-sample procedure may not be as “efficient” 
in the sense of Wald [5] as a strict sequential procedure. On the other hand the 
two sample procedure reduces the number of decisions to be made by the experi¬ 
menter and it will, in certain fields, simplify the experimental procedure. 

2. The two sample procedure. Stein’s double sampling procedure (which may 
be denoted procedure S) to test a hypothesis concerning the mean of a normal 
population consists briefly in the following steps: 

(a) Choose “a priori” a positive number z and a preliminary sample size n, 

(b) Take n independent observations xi , of the random variable X 


1 This lesearch was carried out while the author was at the University of California. 
Berkeley, and was supported in part by the Office of Naval Research. 

601 



002 


IHIVGLAK fi. (’ll A PM \x 


which is assumed to lie distributed according to -VCm, <r“) with unknown mean m 
and unknown variance <j\ and calculate 


( 2 ) 


It, (*. - •?) ■ 


it 


(c,) Let N » max^“ J + 1 , n + ij where [r] *■ largest, integer < r 

(d) Take N — n more independent observations of X and choose a set of 
constants «i, • * • a.v such that 

( 3 ) (i) JL a v «. 1 , (ii) a s - a 2 - ■ * • 23 a n , (iii) 2 &* *■ • 

.V 

X) a.x, — m 

(e) Then --... has .Student’s {-distribution with n — 1 degrees of 

"\/ z 

freedom. 

Stein further showed that the procedure may be modified to some advantage 
in problems dealing with a single population. This modification is not applicable 
in the problems under consideration here. 

There remains to be discussed briefly the choice of n, z and the a’u, The pre¬ 
liminary sample size, n may be determined by other considerations or it may be 
chosen as part of the design of the. experiment. Hodges [ 2 ] has shown that the 
expected value of the total sample size N and the power of the test both depend 
on the choice of n and he has discussed the optimum choice of n with respect 
to the modified procedure of Stein. In general this optimum choice of n depends 
upon, prior knowledge concerning the variance. 

The power of the test will depend upon z: some considerations concerning 
the choice of z will be dealt with after discussing the tables upon which the 
two sample tests are based. 

The arbitrariness involved in choosing the ft’s may be eliminated by placing 
the additional requirement that 


(4) 


ttn+l = &n+2 = * • • = flw == b (say). 


Letting oi = = 

viz., 


(5) 


a n = a it is elementary to solve for a and b explicitly 
na + (iV — ri)b - 1, 


The solutions are 
( 6 ) 

(7) 


na 2 + (N - n)b 2 = ~ . 


h = ~( 14- ,/ n W g ~ H?) ^ 

N V + V (X - -n)u* ) ’ 

1 - {N - n)b 


a = 


n 



TWO SAMPLE TESTS 


603 


3. Tes* for II. The steps involved in testing the hypothesis H are 

t o s. Sr p m ” y sampie sise ”■ “ d *. * •*«* 


w 5-li. 

Zl 

(b) Carry out procedure S with the same 
two statistics T x , T 2 , i e 


ft. for each population, determining 


(9) 


/T7 _ J=1 

V*“ 


a = i, 2). 


Then T\- T 2 has, under the hypothesis tested, the distribution of the difference 
of two independent Student variables. 

If s denotes the difference of two independent random variables h and t 2 
each distributed according to Students ^-distribution with n — 1 degrees of 
freedom and if s 0 is defined by the equation 


P(| S | > So) = a, 

then a test of size a is given by the rule: H is rejected if | Ti — T 2 \ > s a 


4. The distribution of differences of Student variables. The distribution of s 
is easily found by the method of characteristic functions, in case n is even. 
Let m = n — 1 and to simplify slightly put 


( 10 ) 


2 /. 


u_ 

■%/ m 


Then the density function of y x is 


( 11 ) 


f(y) = 



1 

(1 + y*y m+ » lz 


and its characteristic function 


(i =1,2). 


( 12 ) 


<pM = [ + °° e" /t f(y)dy 

oo 


(13) 



I) /2 


(m— 

£ 


C 


m — 1 


m 1 


'm — 


+ r) ! 



[2(111)1 


(m-l)/S- 


Formula (13) may be obtained by contour integration, it is, however, a standard 
formula in connection with Bessel functions of the second kind of purely imagi¬ 
nary argument (cf. Watson [6], pp. 80, 185-188). 



004 


OQVQLAH Cl. CHAPMAN 


While it is not possible to obtain a simple general expression for 


(14) 


/(«') 


;r MM dt, 


the density function of w - tins integral may he evaluated for m = 1. 3 5 

Vm, ’ ’ 

etc. and furthermore the density function of s may he integrated in a closed form 

for such values of m, and consequently tabulated fairly easily. 

In case n is odd it is possible to express <^ v (t) in terms of Bessel functions but 

the Bessel functions obtained are not expressible in a closed form. While the 

problem may be attacked directly by numerical integration, it will generally be 

sufficient to interpolate in Table I where necessary, for such values of n. 

Table I gives the distribution of s for n * 2, 4, 6, 8, 10, 12. For larger values 

of n it may be sufficiently accurate to use the normal approximation to the 

distribution of s. In virtue of the asymptotic normality of the (-distribution s 

will be distributed approximately normally with mean zero and variance 2( - n ~—^ 

for n sufficiently large. 


n — 3 


6 . Power of the test. Writing 

(15) and r-I’.-T, 

■t is seen that T = a + A and hence 

(16) P{H is rejected) « P(| T | > a 0 ) = P(s < —s 0 — A) + P(s > so - A). 
Since 



equation (16) may be used as a guide in choosing zj so that a certain minimum 
power is attained; the presence of the nuisance parameter makes impossible 
the determination of Zi so as to give exactly some preassigned power. 

Since a is distributed independently of <ri , , it follows that the power of the 

test is independent of these parameters. Using the addition formula to express 
the frequency function of a in terms of the frequency function of Students’ 
1 -distribution, it may be shown that /(«) in unimodal and symmetrical about 
8 = 0. Hence the test is unbiased. It alBO follows from (16) that if z» is made to 
approach zero the probability of rejecting H when it is false tends to 1: i.e. 
the test is consistent. 

It may be observed that tests for the one-Bided hypotheses 


mi . . 

— > k or 

m j 



k 



TWO SAMPLE TEST3 


605 


may easily be formulated. Table II provides a table useful for such tests also, 
at half the indicated significance levels. 


TABLE I 

Distribution of a: difference of two independent student-variables with n - 1 degrees of freedom 

The value tabled is P(0^s^s o ) 



The value tabled is s „ 


\ 

Significance Level 

2 

4 

6 

8 

10 

12 

Normal 
Approxi¬ 
mation for 
n - 12 

P(| • | fc «.) - .05 

P(| » | fe So) - .01 

25 41 
127,3 

10.82 

36,8 

3.62 

5 38 

3.34 

4 72 

3 18 

4 42 

3 10 

4 26 

3 06 

4 03 


6. A regression problem. We consider the problem where i, are values of a 
sure variable, Yi are independent random variables with 
( 17 ) E(Yi) = a + bx, 

and ar { is unkn own. It is desired to estimate a and b and to test the hypothesis 

b = be. 












606 


DOUGLAS G. CHAPMAN 


The usual procedure is to assume a\ t constant, and use the Markov theorem 
(j.e. the standard least squares formulae)- In this way unbiased estimates of 
a and b are obtained, whether or not this assumption is fulfilled. However the 
usual significance test for b is not. valid if this assumption (plus normality of 
the F'a) is not fulfilled. 

The two sample procedure leads to a valid lest of the hypothesis b = hi> , with 
power independent of the unknown variance. Since linearity of the expected 
value of Y on x is assumed, the optimum procedure is to observe Y for only two 
values of x, at opposite ends of the range. Let these points be Xi, x 3 . For these 
values of x, procedure 8 may be used (choosing Zi = z 2 ) to determine 1\ , T 2 
where T, — (a + bx,)/\/7 has Student’s /-distribution with n — 1 degrees of 
freedom. 

Then the following estimates of a, b are unbiased, for 71 > 3, 

(18 > 1 ~ (%=i) ^ 

To test the hypothesis //,:/> = b 0 it is necessary only to calculate the statistic 
{• =■ [(7\ — 7\) Vz — b Q (x 1 — £3 )]/Vz and reject Ih , at the a level of sig¬ 
nificance if | f | > so, where f> 0 was defined above (Section 3). 

It is seen that if 1/ is the true value of b, then the power of the test is a function 
of (if — b 0 )(xi — Xi)/\/z and z maybe determined to obtain any prescribed power 
desired. If is also immediate that the power of the test is independent of x Yi . 

The author wishes to express thanks to the members of the computing staff 
of the Statistical Laboratory, University of California, Mrs. E. Putz, Miss J. 
Linton, and Mr. J. Blum, for assistance in preparing Tables I and II.* 

REFERENCES 

[1] George B. Dantzio, “On the non-existence of tests of 'Student’s’ hypothesis having 

power functions independent of <r,” Annuls of Malh, Slat., Vol. 11 (1940), p. 186. 

[2] Joseph L, Hodges, Jr., “The selection of initial sample size in tho Stein two sample 

procedure”, unpublished dissertation, University of California, Berkeley, 1948. 

[3] Henry Schepf£, “On solutions of the Behrens-Fisher Problem based on the /-distribu¬ 

tion’’, Annals of Math. Slal, Vol, 14 (1913), p. 35. 

[4] Charles Stein, “A two sample tost for a linear hypothesis whose power is independent 

of the variance", Annals of Math. Slat., Vol. 16 (1945), p, 243. 

[6] .Abraham Wald, Sequential Analysis, John Wiley and Sons, Ino., 1947. 

[8] G. N. Watson, A Treatise on the Theory of Bessel Functions , Cambridge University 
Press, 1944. 


i It has been pointed out to the writer that percent points of linear combinations of 
two independent Student t’s are given in Table VI (by P. V. Sukatme) in R. A Fisher 
and F. Yates, Statistical Tables for Biological, Medical and Agricultural Research, Oliver 
and Boyd, Edinburgh, 1943 (added in page proof). 



NOTES 

This section is devoted to brief research and expository articles and other short items. 


TRANSFORMATIONS RELATED TO THE ANGULAR AND 
THE SQUARE ROOT 

By Murray F. Freeman and John W. Tukby 1 
Princeton University 

1, Summary. The use of transformations to stabilize the variance of binomial 
or Poisson data is familiar (Anscombe [1], Bartlett [2, 3], Curtiss [4], Eisenhart 
[5]), The comparison of transformed binomial or Poisson data with percentage 
points of the normal distribution to make approximate significance tests or 
to set approximate confidence intervals is less familiar. Hosteller and Tukey [6] 
have recently made a graphical application of a transformation related to the 
square-root transformation for such purposes, where the use of “binomial 
probability paper” avoids all computation We report here on an empirical study 
of a number of approximations, some intended for significance and confidence 
work and others for variance stabilization. 

For significance testing and the setting of confidence limits, we should like 
to use the normal deviate K exceeded with the same probability as the number of 
successes x from n in a binomial distribution with expectation np, which is 
defined by 

— f K e~ il> dt = Prob {x < k | binomial, n, p). 

2ir J-=o 

The most useful approximations to K that we can propose here are N (very 
simple), N + (accurate near the usual percentage points), and N** (quite accurate 

generally), where __ 

N = 2 (V(fc + 1)? -V(n- k)p). 

(This is the approximation used with binomial probability paper.) 

. N + - 1 u = lesser of np and nq, 

& m N ~ 12 VE ’ ' 

m - 2 ){N + 2) / 1 _ _ - 1 - A 

= -g- Wv + i Vnq + l)’ 

N * + 2p — 1 g _ i eS ger of np and nq. 

N ~ N + 12 \/E ‘ . 

For variance stabilization, the averaged angular transformation 

sin " 1 /j/~qpi + siD_1 

» Prepared in connection with research sponsored by the Office of Naval Research. 

007 



008 


MURRAY F, FREEMAN AND JOHN W. Tl’KEY 


has variance within d=G% of 

1 


(angles in radians), 


821 

W T j 11 + \ 

for almost all cases where np > 1. 

In the Poisson case, this simplifies to using 

V# + Vz + 1 

as having variance 1. 


(angles in degrees), 


2, Significance testing. In addition to the approximations mentioned above, 
empirical study was also made of the following 

x — n p 
V npq 1 

L* =* L modified by a term like that in N* , 

M ” 2 VtT+I (sin" 1 - sin" 1 Vv ). 

M* «= M modified by a term like that in N*. 

Taking an upper limit of 2.6 or 3.5 on | K j and a lower limit of 0.01, 1, or 4 
on np, the greatest observed errors of the approximations were smallest for 
N**, N* and M* and largest for the direct approximations L and L*. This 
was true for all sue choices of region. 

If we exclude the cases k = 0 and k «=> n, where the desired probability can be 
calculated directly, the largest observed errors in the substantial number of 
cases computed, which are probably representative of the regions where the 
approximations are worst, were as follows: 


1*1 

E «* np 

rr“ 


AT* 

Largest, observed error of 

N M 

r* 

L 

<2,5 

;>4 

i-04 

.07 

.08 

.14 


.17 

.26 

.35 


si 

.04 

.09 

.13 

.19 

.20 

,24 

.36 

.42 


£0,01 

,04 

.20 

,20 

.19 


.66 

.62 

.80 

<3.5 

£4 

.08 

.07 

.08 

.19 

.25 



.63 


hi 

,11 

.10 

.17 

.21 

.38 


SUB® 

1.26 


£0,01 

.11 

,51 

.60 

,21 

.05 


5.83 

3.42 


Within the range of great interest, | K | < 2.5, that is .0062 < probability 
< .9938, we have errors of less than 0.04 in N** and less than 0.20 in N, 

For 1.5 < | K | < 2.5, the range of greatest interest, the average error of 
N + was less than 0.03 and the maximum was 0.08 (54 cases considered). 






TRANSFORMATIONS 


609 


Thus, we ran recommend 

.V Ha a simple and usually accurate transformation, 

N ' for rapid significance testing, 

A'** for adequate accuracy at all levels. 

Figure 1 shows the behavior of the various approximations in the case n = 60 , 
ftp « 6, This is roughly typical, 



Fiq, 1, Errors of approximation. 


3. Variance Stabilization. The various suggestions for stabilizing the variance 
of the Poisson are: 

VV+ I/% (Bartlett [2]), 

V*T*378, (Anscombe [1]), 


V* + V5+“l» ( thie paper)< 

Figure 2 shows the variance of the theb^. tf small expectations 

Poisson expectation. Clearly V* + V® + , rea d from a square-root 

are to be considered. The simplicity with which it can be read irom squ 

table, and its unit variance, are also fay ora e “ ' j range as 

When an approximation of a given form is to work over as 









010 


MURRAY F. FREEMAN' ANT) JOHN W. TUKEY 


possible -without the magnitude of its errors exceeding a certain limit, the opti¬ 
mum approximation is almost certain to involve errors of both signs. If ±6% 
variation in variance is permissable, n/V + Vx + I is usable for expectations 
of unity or more. It is not surprising that Anseombe’s approximation, obtained 
by eliminating the term in n'~‘, and dominated by the term in n“ *, should only 
meet the ±0% tolerance for expectations of 2.2 or more. 



4. Scope. Values of K, and with Home occasional exceptions, of L, L*, 

M* , N, N + , N* and N** were calculated for 

n » 2, 5, 10, 20, 100, 

V = 1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, 
k giving K < 4.5, 

and similar computations were made for the Poisson case with expectations 
1/100, 1/50, 1/20, 1/10, 1/5, 1/2, 1, 2, 4, 8, 16, 32, 64. 



REMARK 


611 


These computations were made to only two decimal places, so that the final 
results may easily err by 1, 2, or 3 in the second decimal place. 

A more complete discussion of the problem, the origin of the approximations, 
and fables showing a representative collection of actual values can be found in 
Memorandum Report 24 of the Statistical Research Group, Princeton Univer¬ 
sity, which bears the same title as this note. Copies may be obtained from its 
Secretary, Box 708, Princeton, N. J. 

REFERENCES 

[1] F J Anscombe, “The transformation of Poisson, binomial, and negative binomial 

data”, Biometnka, Yol 35 (1948), pp 246-254 

[2] M S. Bautlett, “The square root transformation m the analysis of variance”. Jour 

Roy. Slat. Soc., Suppl., Vol. 3 (1936), pp 68-78 

[3] M, S. Baktlett, “The use of transformations”, Biometrics, Vol. 3 (1947), pp 39-51. 

[4] J. II. Cuhtish, "On tiansfoimations used in the analysis of variance”, Annals of Math. 

Stal., Vol. 14 (1943), pp. 107-122 

[5] Churchill Eiseniiabt, “The assumptions underlying the analysis of variance”, 

Biometrics, Vol. 3 (1947), pp 1-21. 

[0] FnnDEiucK Mostelleh and John W. Tukey, "The uses and usefulness of binomial 
probability paper", Jour. Am Stat Assn , Vol 44 (1949), pp 174-212 


REMARK ON THE ARTICLE “ON A CLASS OF DISTRIBUTIONS THAT 
APPROACH THE NORMAL DISTRIBUTION FUNCTION” BY 
GEORGE B. DANTZIG 1 

By T, N. E. Greville 
Federal Security Agency 

In this interesting and valuable article, Dr. Dantzig showed that, under 
certain conditions, a sequence of frequency distributions connected by a linear 
recurrence formula converges to the normal distribution. Among several applica¬ 
tions of his results which are discussed, the author mentions their relation o 
certain types of smoothing formulas, and has shown that if a linear smoot mg 
formula and the data to which it is applied satisfy certain conditions, the iteration 
of the smoothing process produces a sequence of smoothed distributions whic , 
upon normalization, approaches the normal frequency curve. 

In a summary pa agraph at the end of the article, it is stated that “succes rve 
application of one or many such linear formulas will usually smooth any set of 
values to the normal curve of error ” The entire article was concerned with 
frequency distributions, and a careful reading makes it clear that the author 

ria.r ns 

1 Annals of Math Stat., Vol. 10 (1939), pp. 247-253. 



T, if. E. GIlEVIIXF, 


012 

restrictions imposed on both the original data and the smoothing formula as 
they are stated only by implication, and not explicitly, even though they have 
the effect of excluding important classes of smoothing formulas, such as those 
commonly employed by actuaries. 

The approach to the normal distribution is shown to depend on the vanishing 
of a certain limit denoted as I” which is a function of the moments of the original 
data and of a distribution in which the weights employed in the smoothing 
formula are interpreted as frequencies. At this point, objection may be taken 
to Dr. Dantzig's proof, since the smoothing formulas most frequently used 
contain negative weights. However, it has been shown elsewhere* that the 
occurrence of negative weights will not of itself prevent the sequence of smoothed 
distributions from approaching the normal curve. A somewhat more serious 
difficulty arises if, as is commonly the ease, the smoothing formula has the 
property of reproducing polynomials of a specified degree. If the degree repro¬ 
duced is two or more, this implies the vanishing of the second moment of the 
weight distribution, in which case the limit V does not vanish. In fact, it has 
been shown by DcForest 3 and Schoenberg that the iteration of smoothing 
formulas which reproduce polynomials of higher degree gives rise to a sequence 
of limiting distributions which have the general appearance of the normal curve 
in the center portion and of a damped sine curve in the tails. This is, however, at 
best, a technical exception to Dantzig's statement, as one is still faced with his 
basic proposition that repeated application of a smoothing formula to a frequency 
distribution will cause the smoothed distribution to bo dominated by the char¬ 
acteristics of the smoothing formula rather than those of the original data. 

While he did not intend the statement to refer to data not in the form of a 
frequency distribution, some readers Beera to have interpreted it as being of 
general application, and, for that reason, I should like to point out a few of the 
considerations involved in applying iterated smoothing to other types of data, 
such as, for example, a time series or the values of a mathematical function. 
The limit I", on whose vanishing Dantzig’s theorem depends, involves the 
second and fourth moments of the original data (as well as of the weight dis¬ 
tribution) and, therefore, can be computed only if these moments exist. For 
this it is necessary (but, of course, not sufficient) that the function being smoothed 
shall tend toward zero as the independent variable approaches positive or 
negative infinity. 

In order to iterate a smoothing formula an infinite number of times, it is 
obviously necessary to have an infinite set of original values. Therefore, in 
smoothing, for example, a finite time series, one would have to make some 
assumption regarding the values of the series outside the range for which they 

1 1 J. Schoenberg, "Some analytical aspects of the problem of smoothing,” Couranl 
Anniversary Volume, Interscience Publishers, New York, 1048. 

* H. H Woi/fenuen, "On the development of formulae for graduation by linear com¬ 
pounding, with special reference to the work of Erastus L DeForest,” Trans. Actuarial 
Soc, Am., Vol. 26 (1925), pp. 81-121 



REMARK 


613 


are actually available. Of course, if it were assumed that the values were zero 
outside this range, Dantzig’s theorem would apply. However, under this assump¬ 
tion, infinite iteration of a smoothing formula would not be a rational procedure, 
as it would smooth each value to zero, and the incidental fact that the sequence 
of smoothed distributions, while approaching zero, also approach the form of a 
normal distribution, would not be a very valuable one. In this connection, an 
important distinction between time series and frequency data is that, in dealing 
with the former, one is interested in the magnitude of individual values as well 
as in the general form and shape of the distribution. In practice it might be 
preferable not to make any assumption about the values outside the given 
range but rather to employ special devices to obtain smoothed values near the 
ends of this range. In such a case, the smoothing process would be a function 
of the range (if not of the actual values) of the original data distribution. Such a 
process was not considered by Dantzig, and is clearly excluded by his definition 
of a linear smoothing formula, which requires thak the formula be completely 
independent of the data to which it is applied. 

The somewhat academic question of the effect of iteration of a smoothing 
formula on a function of infinite range for which the moments do not exist, is a 


difficult one, to which I cannot give a general answer. Schoenberg does not 
consider this problem, but merely gives the weight distribution to be applied 
to the original data in order to obtain the limiting smoothed distribution Two 
trivial examples may, however, serve to illustrate the nature of the considerations 
involved. If the original data are values of a polynomial of a specified degree, 
and if a smoothing formula which reproduces that degree is successively applied, 
it will of course continue indefinitely to reproduce the original values On the 
other hand, if the smoothing formula reproduces only polynomials of lower 
degree, a bias is introduced As a simple example, we may consider the case of 
smoothing the function y = x* by a formula consisting of three weights each 
equal to 1/3 to be applied to the given value and its two immediate neighbors. 
It is easily shown that the smoothed value is x + 1/3, and the effect of successive 
application of this formula is to add 1/3 each time. Thus each smoothed value 
would tend toward infinity as the number of smoothings increases, however, 
the entire distribution would always remain a parabola of the same form as 

°TnS; I should like to emphasize that, in common with Dr. Dantzig, I 
do not regard infinite repetition of the smoothing operation as a practical pro- 
2 but consider it preferable to select, in the first instance, a smoothing 
formula which is likely to have the desired effect and then to perform thesmooth- 
SKSSe step. In this way, one is more likely to secure the result desired 
without losing sight of important characteristics of the original data. 



tut 


APklYOM HAWAII V 


INDEPENDENCE OF QUADRATIC FORMS IN NORMALLY 
CORRELATED VARIABLES' 

By Yrnmisi Kawad.v 
Tokyo University of Literature and Science 

The problem to give a Decennary and Miffieient eondition that two quadratic 
forms in normally correlated variables are independent was treated by many 
authors [1), [2], [3], [4], id], We shall give here also it solution of this problem, 
which may be a generalization of that given by K. Mat (To [(»] for nomiegalive 
quadratic forms to the general ease, 

Theorem 1. If two quadratic forms 

n n 

(1) Qi ~ £ff„.r,.r ) , Q 2 = S 

O-l u-l 

in normally correlated variables jq, • • • , ;r„ with zero means and mill the variance 
matrix I satisfy the following four conditions 

(2) F u = E(Q[Ql) - E(Q\)Em * 0 (b j - 1, 2), 

then the relation 

(3) AB “0 {A «■ (a,/), B » (b,#)) 
holds. 

Corollary 1, If QuQs in (1) satisfy the four conditions (2), then Qi and Q 2 
are independent. 

Corollary 2, (Necessity portion of the theorem of Craig) A necessary 
condition for the independence of Q\ and Q« is AB 0. (The sufficiency was 
proved by Craig.) 

Proof of Theorem I. The proof is very simple. Using the values E(xt) — 0, 
(i = 1, 3, 5,7), E(xl) - 1, E(xi) * 3, E(xl) = 15, E(xl) = 105 (k - 1, •«• , n), 
we have by a straightforward calculation 2 the following relations 

(4) Fu = 2Tr(AB), 

(5) F n - 87>(AB 2 ) + 4 Tr{AB)Tr{B), 

(6) F n = 8 Tr{A 2 B) + ATr{AB)Tr{A ), 

(7) F n » S2Tr(A 1 B i ) + K)Tr((AB) 2 ) + 10Tr(AB 2 )2V(d) + 10Tr(A a B)Tr(B) 

+ 8Tr(A.B)7V(A)!Pr(B) + 8?V(AB) 2 . 

1 Presented at the Chapel Hill meeting of the Institute of Mathematical Statistics and 
Biometric Society March 18, 1960. 

! If we apply an orthogonal transformation on (an , • • ■ , x„) so that A becomes a diagonal 
form, the calculation becomes simpler than with the general form. We may note here also 
the fact that we need not assume that an , •• ,x„ are normally correlated, but we use 
only the values of B(x' k ) (i = 1, • • • , 8) for our proof. 



errata 


615 


Put C = AB. Let C be the transposed matrix of C. We have from (2), (4)~(7) 

(8) 27V(A 2 B J ) + Tr{(AB) 2 ) = 2 Tr{CC') + Tr(C 2 ) = 0. 

The left side of (8) is equal to Z.Vi (cj, + c„c 31 + &), which is positive un¬ 
less all c,, = 0 {i, j — 1, • • • , n). Hence we have C = AB = 0, q.e.d 

Corollary 1 follows from Theorem 1 and the theorem of Craig. Corollary 2 
results from observing that independence of Q 1 and Q 2 implies (2). 

R. Mat6m proved, that if A, B are nonnegative, then AB = 0 follows from a 
unique condition F n = 2Tr{AB ) = 0. If only one of the matrices A, B is assumed 
to be nonnegative, we have 

r f iieorem 2. Let A be nonnegative Then from two conditions Fn = 0 , Fn 
= 0 in (2) follows the relation AB = 0 
Proof. From (4), (5) follows Tr(AB 2 ) = 0. Since A is nonnegative, we can 
choose a real symmetric matrix A 0 such that A = Ao Put C n = A 0 B. Then 
we have Tr(AB 2 ) = Tr(CoCo ) = 0 and from this follows Co = 0. Hence we have 
also AB = AoCo = 0, q.e.d 


REFERENCES 

[1] A. T. Oraio, "Note on the independence of certain quadratic forms”, Annals of Math. 

Slat , Vol. 14 (1043), pp 195-197. 

[2] II. Hotbllinq, “Note on a matnc theoiem of A T. Craig”, Annals of Math Stat, 

Vol 15 (1944), pp. 427-429 

[31 II. Sakamoto, “On the independence of two statistics”, Reseaich Memoirs of Inst of 
Slat. Math , Tokyo, Vol 1 (1944), pp 1-25 (in Japanese). 

[4] K. Matusita, "Note on the independence of certain statistics”, Annals of Inst of 

Stat. Math., Tokyo, Vol 1 (1949), pp 79-82 

[5] J. OtSAWA, “On the independence of bilineai and quadratic forms of a landom sample 

from a normal population”, Annals of Inst of Stat. Math., Tokyo, Vol 1 (1949), 
pp. 83-108. 

[6] B, MatIsrn, "Independence of non-negative quadratic forms in normally correlated 

variables”, Annals of Math. Stat , Vol. 20 (1949), pp 119-120 


ERRATA TO “CONTROL CHART FOR LARGEST 
AND SMALLEST VALUES” 

By John M. Howell 

Los Angeles City College 

In the paper cited in the title {Annals of Math. Stat., Vol. 20 (1949), p 306), 
there are some numerical errors in Table I Values of dj/2 and di are given by 
H. J. Godwin in “Some Low Moments of Order Statistics” in the same issue 



fill) 


A UNTRUTH 


of the Annate. Those values arc morn arnirate tlmn those heretofore, available. 
A corrected Table I based on these values is as follows: 


« 


*6 

As 

sit 

A ‘ 

n 

2 

i mi 

. 8250 

1.H800 

2 0951 

3.0111 

2 

3 

, 1.0920 

.748(1 

1.0233 

I 8258 1 

3.0902 

3 

•1 

■ 2.0588 

7012 

.7286 

1.5218 

3.1330 

4 

5 

*2.3259 

.0600 

.5708 

1 3029 1 

3 1099 

5 

0 

i 2.531-1 

.0449 

.48.32 

1 2034 1 

3.2020 

0 

7 

| 2 7043 

.02(50 

.4193 

1.1913 

3 2303 

7 

8 

i 2.8472 

.0107 

.3725 

1 14,31 ] 

3.2556 

8 

9 

| 2.9700 

.5978 

.3307 

1.1038 

3.278-1 

9 

10 

! 3.0775 

i 

.5808 

.3083 

! 1.0720 j 

3.2092 

10 


ABSTRACTS OF PAPERS 

(Abslracts of papers presented at the Berkeley meeting of the Institute, 

August 6, I960) 

1. Sampling from Populations with Overlapping Clusters. Z. W. Birnbaum, 
University of Washington Seattle. 

In cluster sampling it is usually assumed that the flusters are disjoint. In this paper 
situations are considered in which this assumption is not fulfilled. Let the population ir 
consist of N individuals "j", having the variates V*b), j <* 1,2, • * • , N, and lot K clusters 
C[i], t — l, 2, • ■ • , K, he such that each "j u belongs to at least one cluster. Let «[j] > 1 
be the number of different clusters to which f'j" belongs (lire multiplicity of "j"). The 
cluster C(r] contains N( individuals with the variates I r [*, I], i *> 1, 2, ••• , Na 
i - 1, 2, • • - , K, In a sampling procedure, let sub-sample sizes n[t) bo given for oach C\i\, 
and weights X[r, t] for each V[i, (]; a random sample of k clusters C[f u ], u *■ 1, 2, • • • , k 
is obtained, then n[i„] individuals are sampled from C[i»], and for oach of them its vari¬ 
ate and its multiplicity are recorded. Necessary and sufficient conditions aro derived for 

8 F[i„ , (,] X[t„, („] being an unbiased estimate of V »■ . The 

N 

variance of S is found, the weights are studied which minimize this variance, and some 
practically important special oases are derived. 

2. A Simple Nonparametric Test of Independence. Nils BLOtyQViST, University 
of Stockholm. 

Consider a sample of size n from a two-dimensional distribution F(x, y), Let 2 and g 
denote the two sample medians and compute the number of individuals, say k, satisfying 
the inequality x < x, y < g (the trivial difficulty arising when n is an odd number can 
easily be overcome). A test of independence based on k is nonparametric. As a matter of 
fact one has under the null hypothesis that 






abstracts 


617 


where m = [n/2]. In the case of normal F with correlation coefficient p it is possible to 
show, by studying the asymptotic behavior of the power function of the test in the neigh¬ 
borhood of p = 0, that the asymptotic efficienty of the test is (2/ir) J , or about 41% This 
result is based on the fact that k has an asymptotically normal distribution if some regu¬ 
larity conditions arc fulfilled In spite of its low efficiency it is suggested that the test be 
used in cases whoro some information can be neglected in favor of the simplicity of the 
method. 


3. On Minimax Statistical Decision Procedures and Their Admissibility. Colin 

R. Blyth, University of California, Berkeley, 

The problem consideied is that of using a sequence of observations on a random variable 
X to make a decision. Two loss functions Wi and TIT , each depending on the distribution 
F of X, the number n of observations taken, and the decision 5 made, are assumed given 
Minimax problems can be stated for weighted sums of Wi and , or for either one subject 
to an upper bound on the expectation of the other Under suitable conditions it is shown 
that solutions of the first type of problem provide solutions for all problems of the latter 
types, and that admissibility for a problem of the first type implies admissibility for prob¬ 
lems of the latter types, Two examples are given' estimation of E X when X is (1) normal 
with known variance, (2) rectangular with known range The two loss functions are in 
each case KT *« n and an arbitrary nondecreasmg function Wi( | i — 8 |) Admissible 
minimax estimates are obtained. Extensions to any function JFi(w) are indicated, two 
examples are given for the normal case where the sample size must be randomised among 
more than a consecutive pair of integers. 


4. Sufficient Statistics and Unbiased Estimates for "Selected” Distributions. 
Douglas G. Chapman, University of Washington, Seattle. 

A family of distributions obtained from any given family by fixed selection may be 
called a "selected” family. Tukey’s theorem that such selected families admit the same 
set of sufficient statistics as the parent family is proved for an extended class of distribu¬ 
tions Further if the selection does not involve truncation the existence of minimum vari¬ 
ance unbiased estimates of parameters of the parent family ensures the existence of similar 
estimates for the selected family. Some results are derived for minimum variance unbiase 
estimates for truncated distributions, 

5. The Unattainability of Certain Lower Bounds by Product Densities. R. C. 
Davis, U. S. Naval Ordnance Testing Station, China Lake. 

Under weak regularity conditions it is shown that for the case in which the sample size 
i8 a nonrSom variable, certain lower bounds are unattainable Consid.ua unrvana e 
chance variable X , possessing an absolutely continuous distribution function F(z, »), > 
S 1the unknown parameter Under quite general regularity conditions Barankm 

an additional wenk i^umption coneer - ■■' ■ fl ) (obtained by Barankm) 

*e^ed . 



IU8 


ABSTRACTS 


’• ariahh-s and for wIrish yd-T, , x s , ■ • - ,i,) at tains for each n the speeial lower hound given 
liy Harattkin. Obviously in the ease s =’ 2, the lower hound is achieved hy an effirient sta¬ 
tistic it one exists. 

G. A Note on the Power of the Sign Test. T, A. Jeeves and Robert Richards, 
University of California, Berkeley. 

Values obtained by using the normal approximation to the noneentral (-distribution 
given hy Johnson and Welch were compared with exact values given hy Ncyman and 
Tokarska. The comparison indicated that efficiencies of the sign test computed from the 
approximation would he consistently higher Ilian the true, efficiencies. To avoid this bias 
the sign test was randomized ao that levels of significance of a “ 05 and a *» .01 were 
obtained and the exact values of the noneentral l UHed. Efficiencies were computed using 
various measures of equivalence of the power functions- (1) balancing the are.a (Walsh), 
(2) minimizing the, maximum difference, (3) equalizing the power at certain fixed points 
The various measures of equivalence yielded no marked differences in efficiencies. Tables 
were given of the efficiencies for small n. The efficiency for a ■» .05 was about .7 for n be¬ 
tween 0 and 20 anil somewhat higher for « »» .01. The efficiency elowly approaches the 
asymptotic value of 2/V =» .030(1 as n increases. 

7, About Some Classes of Sequential Procedures for Obtaining Confidence 
Intervals of Given Length. (Preliminary Report). Werner R. Leimbacher, 
University of California, Berkeley. 

The special class C\ of such procedures indicated hy A. Wald (Sequential Analysis, John 
Wiley & Sons, 19-17, pp. 146-15(1) can ho extended hy generalizing ami improving the in¬ 
equality on which the procedures are based. It is shown that even in this larger class Ct, 
a procoduro could possibly be optimum only under very special circumstances. The well 
known optimum procedure for a normal distribution N(0, l) can be obtained as the limit 
of a sequence of procedures from Cj. For the suggested sequence, however, the limit no 
longer belongs to C s . In order to eliminate various dcficicnecs of Gj, a modified class C a 
is proposed which contains the well known optimum procedures for the normal anil rec¬ 
tangular distributions. Tho method indicated seoms suggestive Tor the general case of 
estimating location parameters by confidence intervals. 

8, On the Stochastic Independence of Symmetric and Homogeneous Linear 
and Quadratic Statistics. Eugene Lukacs, U. S. Naval Ordnance Testing 
Station, China Lake. 

It is known that the sampling distributions of tho mean and of the variance aie stochas¬ 
tically independent if and only if the parent distribution is normal. This was proven by 
It. C, Geary (Jour, Roy, Slat. Soc., SSirppl,, Vol 3 (1930)),and using a different method by 
E. Lilkaes (Annals oj Math. Slat,, Vol. 13 (1942)). Tho question arises whether there are 
any distributions having the property that tho sampling distributions of tho moan and of a 
symmetric and homogoneous quadratic statistic are independent. It can bo shown that 
there are only tho following possibilities: (1) the parent distribution is normal, (2) the 
parent distribution is degenerate with a single saltus of one, (3) the parent distribution is 
a step function with two steps, located symmetrically wiLh respect to zero, (4) the parent 
distribution is a gamma distribution 

9, The Distribution of the Maximum Deviation between Two Sample Cumula¬ 
tive Step Functions. Frank J Massey, Jr,, University of Oregon. 

Let x, < xt < ■ • < i„ and {/i < j/ 2 < - - - < y m be the ordered results of two random 
samples from populations having continuous cumulative distribution functions F(x) and 



ABSTRACTS 


019 


G(x) icspeotively. Let S„(x) = k/n when k is the numbei of observed values of X which 
aifi less than or equal to x, and similarly let S„(y) = ]/m where/ is the numbei of observed 
values of Y which are less than or equal to y. The statistic d = max | S n (x) — S'„(x) | can 

X 

be used to test the hypothesis F(x) = G[x), where the hypothesis would be lqjeeted if the 
observed d is significantly large. In this paper a method of obtaining the exact distribution 
of d for small samples is described, and a short table for equal size samples is included 
The general technique is that used by the author for the single sample case There is a 
lowei bound to the power of the test against any specified alternative. This lower bound 
approaches one as n and m approach infinity proving that the test is consistent 


10. An Iterative Construction of the Optimum Sequential Decision Procedure 
with Linear Cost Function. Lincoln E. Moses, Stanford University. 

Where the cost of taking n observations is proportional to n, define a sequential decision 
procedure Dt by means of its associated “stopping region” T, T is the set of a posteriori 
probability distributions f(d) for which D T instructs the statistician to take no observa¬ 
tion and to make the decision which minimizes the Bayes risk. Now let Dt be any sequen¬ 
tial decision procedure which has uniformly bounded average risk for every a prion dis¬ 
tribution, f (0) ■ Define T as the derived region of T T' is the set of f (9) such that the Bayes 
risk of 8topping at £(0) is not greater than the risk of taking one observation and then 
using D t . Define T (n+l > - Then it is shown that the sequence of regions (T'" 1 ) n = 

1,2, ■ ■ ■ is monotomcally decreasing to a limit region T“, and that Dr” is the optimum se¬ 
quential decision procedure. Some numerical examples are given where the exact solution 
is obtained and the convergence of the iteration ib examined. (This paper was prepared 
under the sponsorship of the Office of Naval Research.) 


11. On the Law of the Iterated Logarithm for Dependent Random Variables. 

Stanley W. Nash, University of California, Berkeley. 

The order of the remainder term is evaluated m the distribution function of the asymp¬ 
totically normal sum <S„ of dependent random variables of a certain class considered by 
Loive. Bounds arc found for the probability that max | S n I S B n x, where B n is the sum 

of the variances of components of S, . Given an infinite sequence of events A, , a nec¬ 
essary and sufficient condition is found for the probability that infinitely many A„ 
occur to equal one. This criterion extends entena due to Borel. With these results estab¬ 
lished the law of the iterated logarithm is shown to hold for a wide subclass of Lofeve 
class of dependent random variables. Within this class the partial sum - fl. may ap¬ 
proach normality with a speed which depends m a certain functional way on the previous 
sum Si , and which may be arbitrarily slow for some values of S, . The conclusions gener¬ 
alize earlier results due to W. Doeblin and N. A. Sapogov. 

12. Conditional Expectation and the Efficiency of Estimates. Paul G. Hoel, 
University of California, Los Angeles. 

proving 'o' /doeVn”/yield an essentially better estunate than a well 

known estimate. 



620 


NEWS AND NOTICES 


13. Optimum Estimates for Location and Scale Parameters. Raymond P. 
Peterson, University of California and National Bureau of Standards, 
Los Angeles. 

Lot hitW ! E, 8) ~ H'(8 ( , B)p(E) 8), where p{E j B) ia the, joint prohnBility density 
function of the it (not nemwarily independent) sample values i, , , x„ which may be 

represented as a point K «-• (j, , • ■ • , in the n-diniciuticmal Kudidean sample space M, 
The unknown parameters, 0 i , ■ ■ , 8, , may he represented os a point B ■■■■■ (0, , ••• , e.) 
in the s-dimensional Euclidean parameter space U, H'(8, ,8) is a real-valued, nonnega- 
Li vp, measurable weight function, defined for all E in M and 0 in fl, which represents the rola- 
Live seriousnem of taking the estimate BJE) as the. value of 8, for any particular sample 
point B. Let ( 7(8) lie the unknown cumulative distribution function of 8. Then 8<(/J) is 
defined in lie a heat estimate of 8 V , provided that, if 6,(A) is any other estimate of 8, m 
the class under consideration, 7 — /* > 0, whore 

/ » [ f h,(W | A, 6)dB dG(B). 

J a Jm 
Let 

r,(8) « [ h.(]V\E,B)dE, MB) » [ h,(W\E,B)do. 

Jm Ja 

A general tlieorem ia proved to the effect tliat if hi()V \ E, 8) is measurable over the product 
space M X tl and if r,(8) and <p,(E) are uniformly convergent integrals, then a best estimate 
e[(E) of 8, exists provided that r,(8) is constant and that o[(E) minimises y>*(JT) for all 
points E in M. General methods are obtained for constructing best estimates for location 
and scale parameters, separately or jointly, and for functions of location and scale param¬ 
eters from several populations, As special cases, rosults are derived which are analogous 
to converses of Theorems 1 and 2 in Kallianpur’s, "Minimax Estimates of Location and 
Scale Parameters”, Abstract, (Annals of Math. Slat., Vol. 21 (1950), pp. 310-311), 


NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the InsMute news items of interest. 

Personal Items 

Professor William Feller of Cornell University has been appointed Eugene 
Higgins Professor of Mathematics at Princeton University. 

Dr. Leonard Kent, formerly on the staff at the University of Chicago in the 
School of Business, is now with the firm of Alderson and Sessions, 1906 Walnut 
Street, Philadelphia 3, Pennsylvania. 

Dr. G. B. Oakland has resigned an associate professorship of statistics at the 
University of Manitoba to accept the position as Head of Biometrics Unit, 
Division of Administration, Department of Agriculture, Ottawa. 

Dr. Norman Rudy has accepted an appointment as Assistant Professor at 
Sacramento State College, Sacramento, California. 

Professor G. R. Seth has returned to India to accept the position of Professor 
of Statistics and Deputy Statistical Advisor to the Indian Council of Agricultural 
Research, New Delhi. 



NEWS AND NOTICES 


621 


Mr. Eric Weyl, textile engineering consultant, formerly of Manchester, New 
Hampshire, has moved his office to 2509 Vail Avenue, Charlotte, North Carolina. 
Mr. Weyl, a specialist in cotton spinning, serves as regular consultant to many 
leading textile mills. 


The completion and successful operation of SEAC—the National Bureau of 
Standards Eastern Automatic Computer—has been achieved by electronic scien¬ 
tists of the National Bureau of Standards. SEAC is a high-speed, general-purpose, 
automatically-sequenced electronic computer. It was developed and constructed, 
in a period of 20 months, by the staff of the National Bureau of Standards under 
the sponsorship of the Department of the Air Force to provide a high-speed 
computing service for Air Force Project SCOOP (Scientific Computation of 
Optimum Programs), a pioneering effort in the application of scientific principles 
to the large-scale problems of military management and administration. SEAC 
will also be available for solving important NBS problems of general scientific 
and engineering interest. 


New Members 

The following persons have been elected to membership in the Institute 
(.Tune 1, 1950 to August 31, 1960) 

Avon, Russell E., M A. (Univ of MiSB ), Graduate Btudent, University of Mississippi, 
1511 North Mam Si., Water Valley, Mississippi. 

Bamberger, Gunter, Dip -Math (Univ. Gottingen), Division head in the Statistical Office 
of the City of Cologne, Mandcrschadcr Plats IS, Cologne-Suls, Germany. 

Bangdlwala, Ishver S..MS (Univ N. C ), Graduate student, University of North Caro¬ 
lina, 210 A Phillips Hall, University of North Carolina, Chapel Hill. 

Borch, Karl Henrik, M Sc (Oslo Univ ), Field Science Officer for Middle East, UNESCO, 
19 Avenue Kieber, Paris 16e, France 

Buch, Kal R., M.Sc., Assistant Professor, Technical University of Denmark, Eigaardsvej 

H A >, Charloltenlund, Denmark. , 

Carranza, Roque G., Ingenicro Industrial (Univ Buenos Aires), Consultant Industrial 

Engineer, Parana 56, Bvenos Aires, Argentina. , 

Dominguez, Alberto G., Ph D. (Univ Buenos Aires), Professor of Mathematics, Facultad 
de Ciencias Exactas, Fisicas y Naturales, University of Buenos Aires, Paraguay 1527, 

Buenos Aires, Argentina. . 

Dunaway, William L., B S. (Univ. of Calif.), Graduate student, Dept, of Mathematical 
Statistics, University of California, 1,520 Cahuenga Boulevard, North Hollywood, Cah- 

Fernandez, Jose J., Professor, University of Costa Rica, Ap. ISIS, San Jose Costa Rica. 
Fortet, Robert, Ph.D (Paris), Professor, Department of Science de Caen, 168 Rue Capo- 

- Ci....n), Lecturer,'Univaruity ot Frankfurt; B-d 

'TstSu^rLy Kerokhuff-luetitu^, Bud N.uh«ru; Lecturer, T.«b.,ca1 

Mathematics and Dean uf the Faculty of Natural Sciences andMathematiee, Umvere.ty 



G22 


ukpoht np hkhkci.ky meeting 


of treibrug i. Hr., Manager of ''Greellaelinft fur uugeuandt Mathriimtik mu! M<>- 
ehanik". Stndbtmssc S7, Frnburg i Hr , (Itrmnny. 

Gullbaud, George T,, Agrege tip 1 Univ n’uris), Chief, Surf ion a 1’Iimlilule of Science 
Economruur Appliance, Paris, ami Professor, Institute of Statmties, Uiiivcrsily of 
Paris, 38 Boulevard dm Ciipurinrn, Paris 2, France 

Holloway, Clark, Jr.,M,S. (Univ. of Ill ),Prnruss Research Engineer, Gulf Research ami 
Development Co , P tl HOUR, Pittsburgh RO, Prnnityluinia. 

Llebernian, Gilbert,M.A. (Columbia Untv.), Mathematician, U. S. Naval Research Lai •ora¬ 
tory, iW Xnccmnh Ml., E.E , Washington 20, D.C. 

Lomax, K. S., M A. (Manchester t 7 niv,l, lairttirer in Economic Blutiaties, Economies De¬ 
partment, The University, Manchester, England. 

Lorenz, Paul, Ph.D., Professor, University of Re.rlin, KaiaerstuhlatruRBc 21, Berlin-Schlueh- 
tensne, Germany 

Lunger, George F..M.M.A. (Univ. of Midi.), Statistician, Great Lakes Investigations, Fish 
and Wildlife Service, Department of Lire Interior, 21 It) Arlwr Virut Bled,, Ann Arbor, 
Michigan. 

Maggy, Robert K,, M A. (Univ. of Calif,), Graduate student, University of California, 
Hi 85 Euclid Avenue, Berkeley 0, California. 

McElrath, Gayle W., M.K, (Univ. of Mich,), Assistant Profeasor, Department of Engineer¬ 
ing, 21IS Main Engineering Building, Univeraily of Minnesota, Minneapolis, Minnesota 

Nelslus, W. Vincent, M S, (Emory Univ.), Mathematics Instructor, Georgia Inatituto of 
Technology, 887 El, Charles Armin', .V.E., Atlanta R, Georgia. 

Perloff, Robert, M,A, (Ohio ,Stale Univ,), Graduate student and Research Assistant, Re¬ 
search Foundation, Ohio Hlate University, 1281 Bryilen Road, Columbus 5, Ohio. 

Peter, Hans, Di, i or. pol., Professor of Economics, University of Tubingen, Tulnnge.n- 
Wahlhausrn 29, Germany. 

Putter, Joseph, M.Ho. (Hebrew Univ,, Jerusalem), International House, Berkeley 4, Cali¬ 
fornia. 

Rankin, Bayard, A.R, (Univ. of Calif.), Graduate student, University of California, Inter¬ 
national House, Berkeley 4, California. 

Reid, Albert T., B.S. (Iowa State College), Research Assistant in Mathematical Biology, 
Committee on Mathematical Biology, University of Chicago, 5741 Drexol Avenue, 
Chicago 37, Illinois. 

Shaw, Albert, B.S. (Univ. of Alberta), Lecturer, University of Alberta, Department of 
Mathematics, University of Alberta, Edmonton, Alberta, Canada. 

Shuhany, Elizabeth, A.M. (Boston Umv.), Assistant Instructor in Statistics and Assistant 
in Statistical Laboratory of Mathematics, Boston University, 725 Commonwealth 
Avenue, Boston 15, Massachusetts. 

Stewart, John N., B.A. (Univ of Michigan), Graduate student, University of Michigan, 
4834 Chalsworth, Delroit 24, Michigan. 

Strecker, Heinrich, Doctor dor Naturwissonschaflen (Univ. Munchon), Mathematical 
Statistician in the Bavarian Statistical Ollice, Itosenlieimcrstrasac 130, Munich 8, 
Germany, 

Vaswanl, Sundrl (Miss) Ph.D. (Univ. of London), Research Associate in Statistics, c/o 
Ahmcdabad Textile Industry’s Research Association, P.O. Box 170, Almiedabad, India. 


REPORT OF THE BERKELEY MEETING OF THE INSTITUTE 

The forty-fourth meeting of the Institute of Mathematical Statistics was 
held on August 5,1950, on the Berkeley campus of the University of California, 
in conjunction with the Second Berkeley Symposium on Mathematical Statistics 



REPORT OF BERKELEY MEETING 


(323 


and Probability which met fiom July 31 through August 12. Other organizations 
cooperating with the Symposium were the Biopietncs Section of the American 
Statistical Association, The Western North American Region of the Biometric 
Society, the Econometric Society, the Institute of Transportation and Traffic 
Engineering of the University of California, and the Office of Naval Research. 
Some 218 persons registered for the Symposium, including the following 106 
members of the Institute: 


T. W. Anderson, Fied C. Andiows, Jane F. Andrian, Kenneth J Allow, Edward W. 
Barankin, Helen P. Beard, Bobert D Bedwell, Blair M Bennett, Joseph Berkson, Z, W 
Bimbaum, David Blackwell, E Blanco, Nils Blomqvist, Julius R Blum, Cohn R. Blyth, 
A. II. Bowker, George W. Brown, Douglas G Chapman, C L Chiang,K L Chung, William 
G Cochran, Harald Cramdr, Edwin L Crow, J. II Curtiss, R. C Davis, W J, Dixon, J L 
Doob, A. Dvoretzky, Mary Elveback, Benjamin Epstein, Mark W. Eudey, Edward A. Fay, 
William Feller, Edgar H Fickenscher, E Fix, William R Gaffey, Robert S Gardner, S. G 
Ghurye,M. A. Girshick, Paul Gutt, Jack C Gysbers,T E Harris, J L Hodges, Jr , Wassily 
Hoeffdmg, Paul G. Hoel, Harold Hotelling, John M Howell, Harry M. Hughes, R F. 
Jarrett, T. A. Jeeves, Mark Kao, Joseph Kampd de Fdnet, E. S Keeping, Ryoichi Kikuehi, 
Wilfred M. Kincaid, II S Ifonijn, Charles II. Kraft, George M Kuznets, E, L Lehmann, 
Roy B Leipnik, Paul Levy, M. Lohve, Arvid T. LonBeth, Eugene Lukacs, C. A Magwire, 
Jacob Marsohak, Thomas Marschak, F. J Massey, Jr , A M Mood, Lincoln E Moses, 
James T. McWilliam, Stanley W Nash, J Neyman, Howard C. Nielson, Gottfried E 
Noether, Stefan Peters, John C. Petersen, Raymond P. Peterson, Robert I. Piper, Joseph 
Putter, Robert R. Putz, Bayard Rankin, Fred D. Rigby, David Rubinstein, Elizabeth L. 
Scott, Esthor Seiden, Althur Shapiro, Richard H Shaw, Ronald W. Shephard, W B Simp¬ 
son, Monroe Sirken, M Sobel, Herbert Solomon, A. L Stewart, Donald E Stiling, G 
Szego, Robeit Tate, William F Taylor, Leo J Tick, A W Tucker, Elizabeth Vaughan, 
Shanti A. Vora, Abraham Wald, Allen Wallis, J Wolfowitz, Miriam L Yevick. 

Because of the extensive program of more than fifty invited addresses at the 
Symposium, the Institute meeting was devoted only to contributed papers. 
Professor David Blackwell of Howard and Stanford Universities piesided at 
the Institute meeting, at -which the following program was presented: 


1. Sampling from Populations with Overlapping Clusters Z W Birnbaum, University of 
Washington, Seattle 

2 A Simple Nonparamelnc Test of Independence Nils Blomqvist, University of Stock- 

° 3 , o n Minimax Statistical Decision Procedures and their Admissibility Colin R Blyth, 
University of California, Berkeley 

4 Sufficient Statistics and Unbiased Estimates for “Selected Distributions Douglas G 
Chapman, University of Washington, Seattle . 

6 The Unallainability of Certain Lower Bounds by Product Densities R. < Davis, U S 

Naval Ordnance Testing Station, China Lake 

0. A Note on the Power of the Sign Test T. A. Jeeves and Robert Richards, University 

° f I^Aboutlgonu) Classes of Sequential Procedures for Obtaining Confidence Intervals of Given 
Length (Preliminary report) Werner R Leimbaehor, University of California, Berke ey 
8 On the Stochastic Independence of Symmetric and Homogeneous Linear and Quadratic 
Statistics Eugene Lukacs, U S Naval Ordnance Testing Station, China Lake 



{324 


REPORT OF BERKELEY MEETING 


0. The Distribution of the Maximum Drnaltou l/e hrn n Tiro Ham file t'umuhiiiee Sti p Func- 
licmx. Frank J Mimsey, Jr., I'nivpruily of ' ,>Knit. 

10. An flrrative Construction of ft, Optimum ,SVyn, nlmt Ihcntiou Prnrnlure, with I Aw nr 
Cmt Function Lincoln K Mosph, Flnnford Fnivcraii.v, 

11. On the One of the limited Lnqonlhm for Hrj/rrutrni Random VnnahUut. Stanley W, 

Nash, 1'nivorsity of California, Berkeley, 

12. Coivitlioml Expectation and the Effinnmi of Estiimiiin. Illy title). Paul C. Heel, 
University of California, Luh \ngelen 

13 Optimum Ultimate* fur fdicatiim nnd Eml Paramt hr.t. (Hv liilel. Itaytumnl 1’ Peter¬ 
son, Fniverstty of California and National Bureau of Staii>l.ud<t, Los Inge lea. 

The social activities at the Symposium included a tea on August 1, an ext ur- 
sion on August 3, a dinner on August 7, a picnic on August 9, and coffee on 
July 31 and August 2, 4, 7, 8, 10, and 11. 

J. L. Hodges, Jit, 
Associate Secretary 




